Breaking the coherence barrier: asymptotic incoherence and 
asymptotic sparsity in compressed sensing 

B. Adcock A. Hansen C. Poon B. Roman 

Purdue Univ. Univ. of Cambridge Univ. of Cambridge Univ. of Cambridge 

1 Introduction 

In this paper we bridge the substantial gap between existing compressed sensing theory and its 
current use in real- world applications 

We do so by introducing a new mathematical framework for overcoming the so-called coherence 
barrier. Our framework generalizes the three traditional pillars of compressed sensing — namely, 
sparsity^ incoherence and uniform random suh sampling — to three new concepts: asymptotic spar- 
sity^ asymptotic incoherence and multilevel random subsampling. As we explain, asymptotic spar- 
sity and asymptotic incoherence are more representative of real-world problems — e.g. imaging — 
than the usual assumptions of sparsity and incoherence. For instance, problems in Magnetic 
Resonance Imaging (MRI) are both asymptotically sparse and asymptotically incoherent, and 
hence amenable to our framework. 

The second important contribution of the paper is an analysis of a novel and intriguing effect 
that occurs in asymptotically sparse and asymptotically incoherent problems. Namely, the success 
of compressed sensing is resolution dependent. 

As suggested by their names, asymptotic incoherence and asymptotic sparsity are only truly 
witnessed for reasonably large problem sizes. When the problem size is small, there is little to be 
gained from compressed sensing over classical linear reconstruction techniques. However, as we 
show in this paper, once the resolution of the problem is sufficiently large, compressed sensing can 
and will offer a substantial advantage. This is so-called resolution dependence. 

This phenomenon has the following two important consequences, which are also summarized 
in Figure [l] 

(i) Suppose one considers a compressed sensing experiment where the sampling device, the 
object to be recovered, the sampling strategy and subsampling percentage are all fixed, but the 
resolution is allowed to vary. Resolution dependence means that a compressed sensing reconstruc- 
tion done at high resolutions (e.g. 2048 x 2048) will yield much higher quality when compared to 
full sampling than one done at a low resolution (e.g. 256 x 256). This phenomenon has an impor- 
tant consequence for practitioners investigating the usefulness of compressed sensing algorithms. 
A scientist carrying out an experiment at low resolution may well conclude that compressed sens- 
ing imparts limited benefits. However, a markedly different conclusion would be reached if the 
same experiment were to be performed at higher resolution. 

(ii) Suppose we conduct a similar experiment, but we now use the same total number of samples 
TV (instead of the same percentage) at low resolution as we take at high resolution. Intriguingly, 
the above result still holds: namely, the higher resolution reconstruction will yield substantially 
better results. This is true because the multilevel random sampling strategy successfully exploits 
asymptotic sparsity and asymptotic incoherence. Thus, with the same amount of total effort, i.e. 
the number of measurements, compressed sensing with multilevel sampling works as a resolution 
enhancer: it allows one to recover the fine details of an image in a way that is not possible with 
the lower resolution reconstruction. 

On a broader note, resolution dependence and its consequences suggest the following advisory 
for practitioners: it is critical that simulations with compressed sensing be carried out with a 



■"^This paper is part of a larger project on subsampling in applications. Further details, as well as codes and 
numerical examples, can be found on the project website [http : //subsample . org| 



1 



Results of asymptotic sparsity and asymptotic incoherence 



Same object to be recovered (e.g. brain image) 
Same sampling device (e.g. IVIRI macliine) 



Same subsampling fraction and strategy 
(e.g. 5%, multi-level, Gaussian law) 



Low resolution 

(e.g. 256x256) 



High resolution 

(e.g. 4096x4096) 



Poor compressed 
sensing 
reconstruction 
compared to full 
sampling (256x256) 



High quality 
compressed sensing 
reconstruction 
compared to full 
sampling (4096x4096) 



1 



Same number of samples 

(e.g. 262144 samples) 



Full sampling 
at low resolution 

(51 2x51 2 = 2621 44 samples) 



Subsampling at 
high resolution 

(262144 samples 
from 2048x2048) 



Fine details are 
unclear or even lost 
(e.g. small tumors) 



Fine details are 
identifiable or even 
crisp 



(ii) 



Figure 1: This illustrates one of the main messages of the paper: the success of compressed sensing 
is resolution dependent, (i) demonstrates how two identical experiments may give wildly different 
outcomes when performed with the same subsampling percentage at different resolution levels, 
(ii) demonstrates how the same amount of samples can give dramatically different outcomes at 
different resolution levels. In particular compressed sensing serves as a resolution enhancer. 



careful understanding of the influence of the problem resolution. Naive simulations with stan- 
dard, low-resolution test images may very well lead to incorrect conclusions about the efficacy of 
compressed sensing as a tool for image reconstruction. 

An important application of our work is the problem of MRI, which turns out to be a highly 
coherent problem. MRI served as one of the original motivations for compressed sensing, and 
continues to be a topic of substantial research. Some of the earliest work on this problem — in 
particular, the research of Lustig et al. [33, 34, 35 — demonstrated that, due to the high coherence, 
the standard random sampling strategies of compressed sensing theory lead to highly substandard 
reconstructions. On the other hand, random sampling according to some nonuniform density was 
shown empirically to lead to substantially improved reconstruction quality. Since the work of 
Lustig et al. these observations have been confirmed in numerous other investigations [34l [35l [38l 
\39\ l45j , and it is now standard in MR applications to use some sort of variable density strategy 
to overcome the coherence barrier. 

This work has culminated in the extremely successful application of compressed sensing to 
MRI. However, a mathematical theory addressing these sampling strategies is largely lacking. 
Despite some recent work [31] (see Section [s] for a discussion), a substantial gap exists between 
the standard theorems of compressed sensing and its implementation in such problems. 

The purpose of this paper is to introduce a mathematical foundation for compressed sensing 
for coherent problems and to rigorously show that the coherence barrier can be broken. In doing 
so, we provide a firm theoretical basis for the above empirical studies demonstrating the success 
of nonuniform density sampling. In addition, our main results give insight into how to design 
efficient subsampling techniques based on multi- level strategies. 

Whilst the MR problem will serve as our main application, we stress that our theory is ex- 
tremely general in that it holds for almost arbitrary sampling and sparsity systems. As we demon- 
strate, standard compressed sensing results, such as those of Candes, Romberg & Tao [l^, Candes 
& Plan J2], are specific instances of our main theorems. 

Another facet to our work is that we shall present theorems that cover not only the case of 
signals and images modelled as vectors in finite-dimensional vector spaces, but also elements of 



2 



separable Hilbert spaces. This continues the work of Adcock & Hansen on infinite- dimensional 
compressed sensing. In Section 2.2 we explain the importance of this generalisation. 



2 Background 

2.1 Compressed sensing 

Compressed sensing, introduced by Candes, Romberg & Tao [14 and Donoho [21], has been 
one of the major developments in applied mathematics in the last decade JOl HSl EH EH EZ] • By 
exploiting a particular signal structure, namely sparsity, it allows one to circumvent the traditional 
barriers of sampling theory (e.g. the Nyquist rate) and recover signals and images from far fewer 
measurements than was classically considered possible. The list of applications of compressed 
sensing is diverse and growing, and includes medical imaging, analog-to-digital conversion, radar, 
and astronomy. 

A typical setup in compressed sensing, and that which we shall in part follow in this paper, is 
as follows. Let and {(pj}jLi be two orthonormal bases of C^, the sampling and sparsity 

bases respectively, and write 

Note that U is an isometry. 

Definition 2.1. Let U = {uij)i^j=i ^ C^^^ be an isometry. The coherence of U is precisely 

li{U) = max \uij\'^ e [N-\l]. (2.1) 

iJ = l,...,N 

We say that U is perfectly incoherent if ii{U) = . 

A signal / G is said to be 5-sparse in the orthonormal basis {(pj} jLi if at most s of its 

coefficients in this basis are nonzero. In other words, / = XljLi^j^i^ ^he vector x G 
satisfies |supp(x)| < 5, where 

supp(x) = {j : Xj ^ 0}. 
Let / G be s-sparse in {(pj}jLi, and suppose we have access to the samples 

fj = {f,i'j), j = i,...,N. 

Let Q C {1, . . . , N} be of cardinality m and chosen uniformly at random. According to a result 
of Candes & Plan fi2\ and Adcock & Hansen [2 , / can be recovered exactly with probability 
exceeding 1 — e from the subset of measurements {fj : j G 1^}, provided 

m > fi{U) • iV • 5 • (1 + log(e-i)) • log TV, (2.2) 

(here and elsewhere in this paper we shall use the notation a > 6 to mean that there exists a 
constant C > independent of all relevant parameters such that a > Cb). In practice, recovery is 
achieved by solving the following convex optimization problem: 

min llr^ll^i subject to PqUt] = Pnf, (2.3) 

where / = (/i, . . . ,/Ar)^ and Pq G C^^^ is the diagonal projection matrix with j^^ entry 1 if 
j G and zero otherwise. 

The key estimate ( |2.2[ ) shows that the number of measurements m required is, up to a log 
factor, on the order of the sparsity 5, provided the coherence /i(t/) = O (^N~^). This is the case, 
for example, when U is the DFT matrix; a problem which was studied in some of the first papers 
on compressed sensing [14] (this example is actually perfectly incoherent). 



3 



2.2 Continuous/infinite-dimensional problems 

The framework of the previous section is suitable for many problems. However, there are some 
important problems where this framework can lead to significant errors, since the underlying 
problem is continuous, and hence not well represented by a finite-dimensional, vector space model 
O [151 HI]- To address this issue, a theory of compressed sensing in infinite dimensions was 
introduced by Adcock & Hansen in [2 , based on a new approach to classical sampling known as 
generalized sampling [3l [H |6l [5] . For implementation of a continuous/infinite-dimensional model 
of MRI using l^ optimisation see \29^. 

Let us now describe the framework of [2 in more detail. Suppose that H is a separable Hilbert 
space over C, and let {V^jljeN be an orthonormal basis on H (the sampling basis). Let {(pj}j^fq 
be an orthonormal system in H (the sparsity system), and suppose that 

U {uij)ijeN, Uij {(fj.'ipi), (2.4) 

is an infinite matrix. We may consider U as an element of B{P{N)); the space of bounded 
operators on /^(N) (throughout this paper we will make no distinction between bounded operators 
on sequence spaces and infinite matrices). As in the finite-dimensional case, U is an isometry, and 



Setting 
and 



we may define its coherence /i(/7) G (0,1] analogously to (2.1). Note, however, that /i(/7) can be 
arbitrarily small in infinite dimensions. 

We say that an element / G H is (s, M)-sparse with respect to {(fj}j^f^, where s, M G N, 
5 < M, if the following holds: 

/ = ^^3^3^ supp(x) = {j : Xj / 0} C {1,...,M}, |supp(x)| < s. 

^s,M = {x e /^(N) : X is (s, M)-sparse} , 
crs,M{f)= min \\x-r]\\ii, f = ^Xjipj, x = (x^-)jgn ^ ^^(N), 

we say that / is (s, M)-compressible with respect to {(pj}j^fq if (Js,M{f) is small. Whenever / is 
(5, M)-sparse or compressible, we seek to recover it from a small number of the measurements 

fj = {f,^j), jen. 

To do this, we introduce a second parameter G N, and let 1] be a randomly chosen subset of 
indices 1, . . . , A^ of size m. In the absence of noise we now solve 

inf Mil subject to P^UPmV = P^f. (2.5) 
where / = {fj)j^^ G /^(N) and is the projection operator corresponding to the index set Vt. 



In [2I it was proved that any solution to (2.5) recovers / exactly up to an error determined by 



o'k^Mif)^ provided A" and m satisfy an appropriate balancing property with respect to M and s 



(see Definition 4.3), and provided 

m > IJi(U) • A- • s • (1 + log(e-^)) • log {m-^MN^fs) . 

As in the finite-dimensional case (which is a corollary of the result), we find that m is on the order 
of the sparsity s whenever /i(/7) is sufficiently small. 

We shall discuss this result, and in particular, the nature of the balancing property, in more 



detail in ^4.1.2 However, we remark in passing that it is usually not sufficient to take A^ = M, 
and doing so can quite easily lead to substantial errors [2|. In general, one requires A" > M for 
the balancing property to hold. 

Note that this framework generalizes ffnite-dimensional compressed sensing theory in a natural 
way — vector spaces are replaced by separable Hilbert spaces — and known results, such as that 



described in Section 2.1 , are straightforward corollaries of theorems proved in [2 . We shall proceed 
in a similar manner in this paper: ffnite-dimensional theorems will be corollaries of those pertaining 
to the inffnite-dimensional case. 



4 



2.3 The coherence barrier 



In either finite- or infinite-dimensional compressed sensing, the number of measurements required 
is, up to a log factor, on the order of the sparsity s multiplied by ii{U)N . When the coherence 
(or mutual coherence, as it is often known |22l[23]) is small, the energy of the signal is sufficiently 
spread out amongst its samples to allow for recovery using only O (s) measurements. On the 
other hand, when large, one cannot expect to reconstruct an s-sparse vector / from highly 

subsampled measurements, regardless of the recovery algorithm employed [12]. We refer to this 
as the coherence harrier. 

The MRI problem is an important instance of this barrier: 

Example 2.1 (Finite-dimensional MRI model) Suppose that / G is sparse in a discrete 
wavelet basis {v^jj^Li, and let be the rows of N x N discrete Fourier transform (DFT) 

matrix. In this case, the matrix U = DFT • DWT"^ satisfies /i{U) = O (1) for anv [T31I3T]. 

Example 2.2 (Infinite-dimensional MRI model) Let / G L^(R) have compact support, and 
suppose that / is sparse in an orthonormal system of compactly supported wavelets {(pj}j^-^. Let 
{iljj}j^f^ be the standard Fourier basis, i.e. ^pj{x) = y^e^^^-^^^, j G Z (note that we enumerate over 



Z as opposed to N in this case) for suitable e > (see Section 6.4 for details on this construction). 
In this case, /i(/7) = O {!) for any such wavelet basis (Theorem 3.2). For a continuous/infinite- 
dimensional model of MRI using optimisation see [29\ . 

These two problems will serve as our main examples throughout the paper. 



3 New concepts 

We now discuss the main concepts of the paper: namely, asymptotic incoherence, asymptotic 
sparsity and multilevel sampling. We shall work primarily in the infinite-dimensional setting of 
§2.2[ with the finite-dimensional case being a straightforward corollary. 



3.1 Asymptotic incoherence 



Consider Example |2.2| in the case where H = L^(0, 1), e = 1 and {(pj}j^f^ is the Haar wavelet 
basis. Note that the coherence /i(t/) for this problem is exactly one. In Figure [5] we plot the 
absolute values of the entries of the matrix U. As is evident, the larger values of U are located 
near its centre (recall that we enumerate over Z for the sampling basis and N for the sparsity 
basis), and as one moves away from this region the values get progressively smaller. 
This motivates the following general definition: 

Definition 3.1. Let U G S(/^(N)) be an isometry. Then U is asymptotically incoherent if 

^{PkU), KUPn) ^0, TV ^ oo. (3.1) 

Here P/v G B{P{N)) is the projection operator onto spanje^ : j = 1, . . . , N}^ where {ej}j^f^ is 
the canoncial basis for /^(N). In other words, U is asymptotically incoherent if the coherence of 
the infinite matrices formed by replacing either the first TV rows or columns of U by zeros tends 
to zero as ^ oo. 

Asymptotic incoherence in the case of Haar wavelets with Fourier sampling is indicated by 
Figure [2] As it happens, this is always the case for the problem of Example 2^, regardless of the 
wavelet basis used. We have 



Theorem 3.2. Consider the setup of Example \2.^ Then fi{U) > e|4>(0)p; where ^ is the corre- 
sponding scaling function, and ii{P^U) ^ ii{U P^) = O as N ^ oc. 



5 







1024 



2048 



Figure 2: The absolute values of the matrix U for Haar wavelets with Fourier sampling. Light 
regions correspond to large values and dark regions to small values. Observe the asymptotic 
incoherence as described by Theorem 3.2 



3.2 Multi-level sampling 

Asymptotic incoherence suggests a different subsampling strategy should be used instead of stan- 
dard random sampling. High coherence in the first few rows of U means that important informa- 
tion about the signal to be recovered may well be contained in its corresponding measurements. 
Hence to ensure good recovery we should fully sample these rows. Once outside of this region, 
when the coherence starts to decrease, we can begin to subsample. Let A^i,A^, m G N be given. 
This now leads us to consider an index set Q of the form Q = Qi U where Qi = {1, . . . , TVi}, 
and Cl2 ^ {^1 + 1, • • • 7 is chosen uniformly at random with \Q2\ = m. We refer to this as a 
two-level sampling scheme. As we shall prove later, the amount of subsampling possible (i.e. the 
parameter m) in the region corresponding to Q.2 will depend solely on the sparsity of the signal 
and coherence ii{P^JJ). 

The two-level scheme represents the simplest type of nonuniform density sampling. There is 
no reason, however, to restrict our attention to just two levels (full and subsampled). In general, 
we shall consider multilevel schemes, defined as follows: 

Definition 3.3. Let r G N, N = (A^i, ...,Nr)eW with I < Ni < . . . < Nr, m = (mi, . . . ,m^) G 
W , with rrik < — Nk-i, /c = 1, . . . , and suppose that 

^ {Nk-i + 1, . . . ,A^/c}, l^fcl = rrik, /c = 1, . . . ,r, 
are chosen uniformly at random, where Nq = 0. We refer to the set 

1^ = m '•— ^1 U . . . U 

as an (N^m) -multilevel sampling scheme. 

The same guiding principle applies as in the two- level case. In the region of highest coherence, 
i.e. r^i, we take more measurements, and as coherences decreases, i.e. as the level number k 
increases, we take progressively fewer. Note that our introduction of multilevel schemes is not 
just for the purposes of mathematical intricacy: in practice, they are often more effective than 
two-level schemes. 

Note that similar sampling strategies are found in most empirical studies on compressive MRI 
8l|39 . A closely related strategy was considered by Candes & Romberg in [13 (see also 
]), where the sampling levels correspond precisely to the wavelet scales. Our theory generalizes 



this approach by removing this condition (see Remark 3.1). Moreover, our theorems also do not 
require the image to be first separated into individual subbands before sampling, such as the case 
in [13]. 

Another instance of a two- level strategy is found in [42] . Here the authors consider application 
of compressed sensing in fluorescence microscopy via a so-called "half-half" scheme. 



6 



3.3 Asymptotic sparsity in levels 

In the case of perfect incoherence, the standard random samphng strategies of compressed sensing 
are highly effective for sparse signals. However, in asymptotically incoherent setting, the notion 
of sparsity can be substantially relaxed. 

To explain this, let x — {xj)j^f^ G /^(N) be the infinite vector of coefficients of a function / in 
the sparsity system {(pj}j^f^. Suppose that x was very sparse in its entries j = 1, . . . , Mi, but the 
exact location of these nonzero coefficients xj was unknown. Since the matrix U is highly coherent 
in its corresponding rows, there is no way we can exploit this sparsity to achieve subsampling. 
High coherence forces us to sample fully the first Mi rows, otherwise we run the risking of missing 
critical information about x. On the other hand, once the asymptotic incoherence sets in we are 
able to subsample the rows of U and recover sparse sets of coefficients. 

This means that there is nothing to be gained from high sparsity of x in its first few entries. 
However, we can expect to achieve significant subsampling if x is asymptotically sparse, i.e. the 
coefficients xj, j > Mi, are sparse for some sufficiently large Mi. This motivates the following 
two definitions: 

Definition 3.4. For r G N let M = (Mi, . . . , Mr) G with I < Mi < . . . < Mr and s = 

(si, . . . ,Sr) G W , with Sk < Mk - Mk-i, /c = 1, . . . , where Mq = 0. We say that x G /^(N) is 
(s, ^) -sparse if, for each /c = 1, . . . , 

A/e := supp(x) n {Mk-i + 1, . . . , M/e}, 

satisfies |A/e| < Sk- We denote the set of (s, M)-5par5e vectors by Ss,m- 

Definition 3.5. Let f = ^j^-^Xjcpj G where x = {xj)j^f^ G /"^(N). We say that f is (s,M)- 
compressible with respect to {(pj}j^f^ if osMif) small, where 

ctsmU) •= ™n \\^-v\\i^' (3.2) 

As we shall explain, signals possessing this sparsity pattern — which we henceforth refer to as 
being asymptotically sparse in levels — are ideally suited to multilevel sampling schemes. Roughly 
speaking, the number of measurements rrik required in each band ft^ is determined by the sparsity 
of / in the corresponding band and the asymptotic coherence. 



Remark 3.1 In Section 5.1 we shall show that natural images are asymptotically compressible 
when the levels M correspond to the wavelet scales. In this case it is somewhat natural, although 
not necessary, to employ a multilevel sampling scheme corresponding exactly to these levels. As 



mentioned in Section 3.2, this particular approach was previously considered in |13] . 



4 Main theorems 

4.1 Two- level sampling schemes 
4.1.1 The sparse and noiseless case 

We shall commence with the finite-dimensional result concerning exact recovery, but before that 
we need a definition of local coherence: 

Definition 4.1. Let U G B{P{N)) be an isometry. Given N eN we define 

IfN = {Ni,...,Nr)eW and M = (Mi, ...,Mr) eW with 1 < Ni < . . . Nr and 1 < Mi < 
. . . < Mr we define the (/c, ly^ local coherence of U with respect to N and M by 



fiNMikJ) = ^J|^{PN^UPM^) ■ f^iPk-.U)^ k,l = l,...,r, 
where Nq = Mq = and denotes the projection corresponding to indices {a + 1, . . . , 6}. 



7 



We can now state the first main tileorem. 

Theorem 4.2. Let U G C^^^ be an isometry and x G be (s^'M.)- sparse, where r = 2, 
s = (Ml, 52); 5 = Ml + 52 and M = (Mi, M2) with M2 TV. Suppose that 

\\P^,UPmA\<^^. (4.1) 
for some 1 < Ni < N and 7 G (0, 2/5]^ and that 7 < S2y/lJiNi ■ For e > 0^ let m satisfy 
m > (iV - iVi) • (log(se-i) + 1) • /XiVi • ^2 • log (N) . 

Let Q = r^N,m be a two-level sampling scheme, where N = (A/'i,7V2) and m = (mi, 7712) 

A/'2 = mi = A/'i anti m2 = m, and suppose that ^ G is a minimizer of (2.3), where f = Ux. 

Then, with probability exceeding 1 — e, is unique and (, = x. 

Note tiiat tiie proof of tiiis tileorem inciudes tiie standard random sampiing compressed sensing 
resuits of ^2.1 as a speciai case. Tiiis foiiows by aiiowing Mi = A^i = 0, in wiiicii case (4.1) is 
redundant. 

To state tiie corresponding result in tiie infinite-dimensionai case, we require tiie foiiowing 
definition of tiie balancing property [2 : 

Definition 4.3. Let U G S(/^(N)) be an isometry. Then N e N and K > 1 satisfy the weak 
balancing property with respect to M e N and s eN if 

WPmW'PnUPm - Pm 11/00^,00 < i (\ogy^ {4^KM)y' , (4.2) 

where ||-||^oo^^oo is the norm on S(/^(N)). We say that N and K satisfy the strong balancing 
property with respect to C/, M and s if U-2^ holds, as well as 



We now iiave tiie foiiowing: 



IP^U'^PnUPmWio^^io^ < i (4.3) 



Theorem 4.4. Let U G B{P{N)) be an isometry and x G 1^{N) be {s^M.) -sparse, where r = 2, 
s = (Ml, 52) and M = (Mi, M2). Suppose A/'i, A^2, m2 G N are such that the parameters 

N:=N2, K :={N2-Ni)/m2, 

satisfy the weak balancing property with respect to U , M := M2 and s := Mi + S2, and that, for 
some 7 G (0, 2/5], 

\\pIupmA\< 



/Ml' 

and 7 < S2-s/fiNi • For e > 0^ let m satisfy 



m 



> (TV - TVi) • {\og{se-') + 1) • • S2 • log (i^My^) , (4.4) 



and suppose that Q = r^N,m is a two-level sampling scheme, where N = jNi^ N2) and m = 
(mi,m2) with mi = A^i and m2 = m. Let ^ G 1^{N) be a minimizer of (2.5), where f = Ux. 
Then, with probability exceeding 1 — e, £^ is unique and coincides with x. If m = N — Ni then this 
holds with probability 1. 

Note tliat this theorem generalizes the infinite-dimensional compressed sensing result of [2] 
(see also ^2.2) to the two-level sampling case. 



8 



4.1.2 The role of the balancing property 



The main difference between the finite- and infinite-dimensional theorems (Theorems 4.2 and 4.4) 
is that the parameters N and K in the latter must satisfy the weak balancing property. 

The balancing property ensures that the truncated matrix PnUPm is close to an isometry. 
In reconstruction problems, the presence of an isometry ensures stability in the mapping between 
measurements and coefficients [3 , which explains the need for a such a property in our theorems. 
As explained in [2 , without the balancing property the lack of stability in the underlying mapping 
leads frequently to numerically useless reconstructions. 

Note that the balancing property does always hold, provided N is chosen sufficiently large in 
comparison to M. For details we refer to [2 . On the other hand, no balancing property is required 
in the ffnite-dimensional case since PnUPm = is an isometry by assumption. 

4.1.3 The noisy, nonsparse case 

In realistic problems, signals are never exactly sparse (or asymptotically sparse), and their mea- 
surements are always contaminated by noise. Let / = ^j^j be a ffxed signal, and write 

y = Pnf^z = PnUx^z, 

for its noisy measurements, where z G ran(P^) is a noise vector satisfying ||2:|| < S for some (5 > 0. 
If 6 is known, we now consider the following problem: 

inf Mil subject to \\PnUr] - y\\ < 6. (4.5) 

rjel-L 



We now state our main result on (4.5) for two- level schemes. Since the ffnite-dimensional case is 



a straightforward corollary of the inffnite-dimensional result, we present only the latter: 

Theorem 4.5. Let U G B{P{N)) be an isometry and x G /"^(N). Suppose that Q = r^N,m 
is a two-level sampling scheme, where N = (A/'i,A/'2) and m = (A/'i,m2). Let (s,M)^ where 
M (Ml, M2) G N^, Ml < M2, and s = (Mi, S2) G N^, be any pair such that the following holds: 

(i) we have \\P^_UPm^ \\ < and 7 < S2yfjlN^ for some 7 G (0, 2/5]; 
(a) the parameters 

N:=N2, K:={N2-Ni)/m2 
satisfy the strong balancing property with respect to U , M := M2 and s := Mi + 82; 

(Hi) for e > 0; let 

m2>{N- TVi) • {\og{se-^) + 1) • /i^, • ^2 • log {KM^) . 
Suppose that ^ G 1^{N) is a minimizer of ( [^.5p . Then, with probability exceeding I — e, we have 
lie - ^11 < C (1 + VT) • ((5 • (1 + L • v^) + as,M(/)) , (4.6) 



for some constant C, where cTs^mif) as in (3.2), and L = V6K ^| + + i ^^^fflrM J^) ") * 
1712 = N — Ni then this holds with probability 1. 



Note that the constant C in (|4.6[) (and also in Theorems 4.7 and |4.8| later) is exactly the 



constant of Proposition 6.1, and can bound found explicitly by following the steps of the proof. 



However, we have made no attempts to optimize this constant, hence we leave it in this form. 



9 



4.1.4 Discussion 



Theorems 4.2 and 4.4 demonstrate that asymptotic incoherence and two-level sampling overcomes 
the coherence barrier. To see this, note the following: 



(i) The condition \\P^UPm^ 



< 



by/Ml 



(which is always satisfied for some A^i, since U is an 



isometry) implies that fully sampling the first Ni measurements allows one to recover the 
first Ml coefficients of /. 

(ii) To recover the remaining 52 coefficients we require, up to log factors, an additional 

m2>{N- Ni) • /iATi • S2, 

measurements, taken randomly from the range Mi + 1, . . . ,M2. In particular, if A^i is a 
fixed fraction of and if juni = O (A^^~^), such as for wavelets with Fourier measurements 
(Theorem 3.2), then one requires only m2 ^ S2 additional measurements to recover the 
sparse part of the signal. 



(iii) 



When / is asymptotically sparse, such is the case for natural images (see Section 5.1), then 
the relative size of 52 will become smaller as M and N grow. In particular, the percentage 
^ Ni+m2 ^ ^ 200 of measurements required will decrease. Hence the subsampling rate possible 



will improve as the problem resolution becomes larger (see Section 5.2). 



Note also that the two-level sampling scheme is completely robust in the presence of noise and 
inexact asymptotic sparsity, as shown by Theorem |4.5[ 



Remark 4.1 It is not necessary to know the sparsity structure, i.e. the values s and M, of 
the image / in order to implement the two-level sampling technique (the same also applies to the 
multilevel technique discussed in the next section). Given a two- level scheme Q = l^N,m, Theorem 



4.5 



demonstrates that / will be recovered exactly up to an error on the order of crs,M(/), where s 
and M are determined implicitly by N, m and the conditions (i)-(iii) of the theorem. Of course, 
some a priori knowledge of s and M will greatly assist in selecting the parameters N and m so 
as to get the best recovery results. However, this is not necessary for implementing the method. 



Remark 4.2 To simplify their presentation. Theorems 4.2, 4.4 and 4.5 contain the additional 
condition 7 < S2^//j^Ni • This is a lower bound for the sparsity S2 in the second level. It is possible 
to remove this condition, in which case the corresponding estimate for 1712 will be 



■ MiVi ■ S2 ) 



(4.7) 



'/iATi merely reduces ( |4.7[ ) to that given 
As we explain in Section 4.2.3, the reason for the additional term in (4.7) is 



plus log factors. Note that adding the constraint 7 < ^2 
in the theorems. 



due to the phenomenon of interference between the two sparsity levels. Fortunately, however, the 
condition 7 < S2^/JiN\ is always satisfied in practice. Typically in practice both S2 and Ni are 
fixed percentages of the total resolution N (see Section 5.2). If jun^ = (9 (1/A^i), for example, then 



this condition will always hold for all reasonably large A^. 



4.2 Multilevel sampling schemes 

We now consider multilevel sampling schemes with arbitrary numbers of levels. Before we present 
our main theorems we require the following definition: 

Definition 4.6. Let U be an isometry of either C^x^ or B{P{N)). For N = (TVi, ...,Nr)e W, 
M = (Ml, ...,Mr) eW with I < Ni < . . . < Nr and I < Ml < . . . < Mr, s = (si, ...,Sr)eW 
and 1 <k <r, let 

Sk = ^ife(N,M,s) = m^^\\P^l-'Ur^\\\ 
where Nq = Mq = and 6 is given by 

& = {v:\\v\\i- <l,\snpp{PMrv)\ = Si,l = l,...,r}. 



10 



4.2.1 The finite-dimensional case 

We start with the finite-dimensional case. For brevity, we now only present our results for the 
noisy, nonsparse case. The corresponding theorems for the exactly sparse, noise-free case are 
straightforward corollaries. 

Theorem 4.7. Let U G C^^^ be an isometry and x G C^. Suppose that Q = r^N,m ^-^ a 
multilevel sampling scheme, where N = (TVi, . . . , A/"^) G and m = (mi,...,mr) G N^. Let 
(s,M), where M = (Mi, . . . ,M^) e W , Mi < . . . < Mr, and s = (si, . . . ,5^) G W, be any pair 
such that the following holds: for e > and 1 < k <r, 

1 > ~ • Mse-') + 1) • (/iN,M(^, 1) • 51 + . . . + /iN,M(^, r) • Sr) • log (TV) , (4.8) 
ruk 

where s := si -\- . . . -\- Sr and 

rrik >77ik- (log(5e"^) + 1) • log (TV) , 

where rhk satisfies 

1 > ^— ^ - 1 j . . .1 + . . . + (^— — - 1 j • M^.-. • (4.9) 

for any 5i, . . . , G (0, oo) such that 

si + . . . + < si + . . . + 5r, Sk < 5'/c(N, M, s). 



Suppose that ^ G /"^(N) is a minimizer of |^.5p . T/ien^ m^/i probability exceeding I — e, we have 
that 

11^ - x|| < C • (l + V^) • (J • (1 + L • v^) + as,M(/)) , (4.10) 



for some constant C, where cTs^mif) is as in (3.2), L = \^6K (^| + \/-*- ^ (4^^ V^) ") ^ 
max/e=i ... r 1 r- ^/^/c = J^k — < k <r, then this holds with probability 1. 



4.2.2 Example: the block-diagonal case 



The most important part of Theorem |4.7| are the bounds for the number of samples rrik required in 
the k^^ level of the multilevel scheme. This depends completely on the behaviours of the quantity 
6'/e(N, M, s) and the local coherences /iN,M(^,0- 

Consider the case where the isometry U G C^^^ is a block diagonal matrix, with the k^^ 
block being of size (TV/^ — N^-i) x (Tkf/e — M^^i). In this setting, Sk = for all /c, and therefore 



we may set Sk = Sk in Theorem 4.7 Since one also has in this case that /iN,M(^, /) = whenever 



k I, and since /i(N, M)(/c, /c) < /UNk-i^ equations (4.8) and (4.9) reduce to 

1 > ~ • (log(5e-i) + 1) • • Sk • log TV, 

rrik 

and 

/TVi-TVo \ ^ , ^ TV.-TV.-i .\ 

1 > 1 • /iATo • Si + . . . + 1 • /iAr,_i • Sr. 



mi 

In particular, it suffices to take 



ruk > {Nk - Nk-i) ■ (log(5e-^) + l) • • • logTV, 1 < k < r. 

This is as one might expect: the number of measurements in the k^^ level depends on the size of 
the level multiplied by the asymptotic incoherence and the sparsity in that level. 



11 



4.2.3 Sharpness of the estimates 



When U is not block diagonal the inequalities in Theorem 4.7 cannot be reduced to such a simple 
form. In general, there can be interference between different sparsity levels, which means that Sk 
can be larger than Sk. Indeed, in general one has the upper bound 

Sk < S = Si ^ . . . ^ Sr, 

and it is possible to show that there matrices U for which Sk = s for all /c = 1, . . . ,r. In such 
cases Theorem |4.7| predicts that ruk must scale at least like the total sparsity s, as opposed to the 
local sparsity Sk- 

This actually turns out to be a sharp result. Consider the following construction. Let N = rn 
for some n G N and N = M = (n, 2n, . . . ,rn). Let W G C^^"^ be any isometry, and suppose 
that V G C^^^ is a full isometry, i.e. all its entries are nonzero (for example, V could be the DFT 
matrix). Let U = V ^W, where (g) is the usual Kronecker product, and note that U G C^^^ is 
also an isometry. Now suppose that x = (xi, . . . , Xr) G is an (s, M)-sparse vector, where each 
Xk G C"^ is 5/c-sparse. Suppose also that s < n/r and that the x^'s have disjoint support sets, i.e. 
supp(x/c) n supp(x^) = 9, k ^ 1. Then one has that 

r 

Ux = y, y={yi,...,yr), yk = Wzk, Zk = ^VkiXi. 

1=1 

Hence the problem of recovering x from measurements y with an (N, m)-multilevel strategy de- 
couples into r problems of recovering the vector Zk from measurements yk = Wzk, k = 1, . . . ,r. 
However, by construction each vector Zk is exactly 5-sparse. Hence, since the coherence provides 
an information-theoretic limit [12 , one requires at least ruk ^ n ■ iii{W) ■ s • \ogn measurements 
at level k in order to recover each Zk, and therefore recover x, regardless of the reconstruction 



method used. This is exactly as predicted by Theorem |4.7| for this example. 

Fortunately, such cases are extreme, and do not arise in the main application considered in 
this paper, namely that of recovering wavelet coefficients from Fourier samples. In fact, in this 
case, numerical results suggest that Sk ^ Sk, and therefore the subsampling permitted at level k 
is dictated mostly by the sparsity Sk and the asymptotic coherence jiiNk-i- Note that the other 
local coherences /iN,M(^,0 with I ^ k do have an effect. However, this tends to be negligible, 
since they are usually much smaller than /iN,M(^, k) in practice. 

4.2.4 The infinite-dimensional case 

In the infinite-dimensional setting our main theorem is as follows: 

Theorem 4.8. Let U G B{P(N)) be an isometry and x G /""^(N). Suppose that Q = r^N,m is a 
multilevel sampling scheme, where N = (A^i, . . . , A^^) G and m = (mi,...,m^) G N^. Let 
(s,M), where M = (Mi, . . . ,M^) e W , Mi < . . . < Mr, and s = (si, . . . G W, be any pair 
such that the following holds: 



(i) the parameters 



N := Nr, K := max 

/c=l,...,r 



\ Nk-Nk-i \ 
\ rrik } 



satisfy the strong balancing property with respect to U , M := Mr and s := Si -\- . . . -\- s. 
(a) for e > and 1 < k <r, 
Nk - Nk-i 



1 > ' • (log(5e-^) + 1) • (/iN,M(^, 1) • 51 + . . . + /iN,M(^, r) • Sr) • log (kMv^) , 

(log(5e-i) + l)-log (i^Mv^), 



rrik 

where s := si -\- . . . -\- Sr and 



rrik > rrik- 



where rhk satisfies {4-^ ctrid 

M = mm{i G N : ||PAr/7i^-^i || < l/{32K^/^)}. 



12 





Figure 3: Top row: test functions. Bottom row: percentage of wavelet coefficients at each scale 
which are greater than 10~^ in magnitude. 



Suppose that ^ G 1^{N) is a minimizer of |^.5| ). Then, with probability exceeding I — e, 



for some constant C , where os^-Mif) is as in (3.2), and L 
Nk — Nk-i for 1 < k < r then this holds 



/6K 



rrik 



with probability 1. 



1 



log2(e-i) + l \ 
log2(4KMy^) J 



(4.11) 



This theorem is very similar to that of the finite-dimensional case (Theorem |4.7[ ), except for 
two ke y diffe rences. First, as in the two- level setting, it requires the balancing property (see 
Section 4.1.2). Second, the final logarithm involves the quantity M. Note that M is finite, since 
U is an isometry. In the case of Fourier sampling with wavelets, for example, M = O {KN). 



5 Main consequences 

We now in a position to discuss the main consequences of our theorems. This is the content of 



Sections 5.2-5.4 Before doing so, however, we first need to explain the relevance of our framework. 



5.1 Real-life images are asymptotically sparse in wavelets 

The reader may at this stage be asking whether or not the new definitions of asymptotic sparsity 
and compressibility are reasonable in practice. The answer to this is emphatically yes: natural, 
real-life images possess exactly these types of sparsity patterns in their wavelet coefficients. 

In Figure [3] we tabulate the number of significant wavelet coefficients at each wavelet scale 
for two test functions. At coarse scales, there is very little sparsity. However, sparsity rapidly 
increases as the wavelet scale gets finer, demonstrating asymptotic sparsity in both cases. 

This is not a new insight. Take, for example, a piecewise constant function / of one variable 
with a jump at an irrational point. It is well known that the number of nonzero wavelet coefficients 
at level k scales like the logarithm of the total number of coefficients at that level. More generally, 
the wavelet coefficients of natural images possess a sparse tree structure [16l. This fits exactly 
into the notion of asymptotically sparse in levels. 



13 



0.8 
0.6 
0.4 
0.2 




0.2 0.4 0.6 0.8 

(a) 




128 256 512 1024 2048 4096 
Resolution level 



Figure 4: (a): the ID phantom (5.1). (b): the minimum subsamphng percentage p for success of 
the two-level sampling scheme. 



5.2 First consequence: the success of compressed sensing is resolution 
dependent 

We now discuss the first, and main, consequence of our theorems: the success of compressed 
sensing is resolution dependent. 

Let us explain this phenomenon in more detail. Suppose the resolution M := in the 
wavelet domain is relatively low. The image /, although asymptotically sparse, is not particularly 
sparse at this resolution. Therefore regardless of how we choose to recover /, we will require a 
relatively large proportion of measurements. Taking too few measurements will give to a poor 
quality reconstruction, and as discussed in Section [l] lead to erroneous conclusions about the 
usefulness of compressed sensing. 

On the other hand, when M is large, asymptotic sparsity kicks in and / becomes increasingly 
sparse at finer wavelet scales. As shown by our theorems, multilevel sampling allows one to 
exploit this property and recover / using a much smaller percentage of total samples than in 
the low resolution case. Hence compressed sensing becomes substantially more effective at higher 
resolutions. 

Example 5.1 Consider the setup of Example 1 2 . 2 1 with the ID phantom (visualized in Figure |4ja)) 
given by 



m 



'x[o.2,o.8](i), te[0,l]. 



(5.1) 



We sample this function by measuring its Fourier coefficients, and take the sparsity system {(pj}j^f^ 
to be the orthonormal basis of Haar wavelets on [0, 1]. We then take a two-level sampling scheme 
with p/2% fixed samples and p/2% random samples, where p is the total subsamphng percentage. 
In our experiment, we search for the smallest value of p such that the two-level sampling scheme 
succeeds: namely, it gives an error that is no worse than that of full sampling (by full sampling, 
we mean the scheme that uses all possible samples in the resolution range, i.e. that based on 

n = {i,...,N}). 

In Figure [4]^b) we plot p against the resolution level N. For each N we have the same sampling 
device (Fourier samples), the same recovery algorithm (compressed sensing with two-level sam- 
pling) and the same function to recover. However, the difference between low resolution {N = 128) 
and high resolution {N = 4096) is dramatic. For the former, we require nearly 60% of the samples, 
whereas this value drops to less than 8% as we move to the higher resolution. 



5.3 Second consequence: the optimal sampling strategy depends on the 
signal structure 

Theorems |4.7| and |4.8| demonstrate that the required sampling density at the k^^ level is determined 
by the quantities /iN,M(^,05 I^Nn and Sk- Hence the optimal sampling strategy is completely 
dependent on the signal structure. Specifically, it is determined by the widths — Mk-i of the 
sparsity bands, and the sparsity levels within them. 



14 



(a) Wavelet coefficients and subsampling reconstructions from 10% of Fourier coefficients. 




1 23456789 10 xlo=^ 

(b) Reversed wavelet coefficients and the reconstructions from the reversed coefficients. 



Figure 5: Left: Daubechies 4 wavelet coefficients of the analytic phantom (see Section 7.1 ). Center: 
reconstruction from 10% Fourier coefficients from a multi-level subsampling scheme (see Section 
7.2). Right: reconstruction from 10% Fourier coefficients from a subsampling scheme using a 



continous power law (see Section 7.3). Note that had the RIP held, the upper and lower figures 
would have been identical. 



This phenomenon can be illustrated by the following example: 

Example 5.2 In Figure [s] we consider the reconstruction of a phantom brain image (see Section 
7.1 for a discussion). As can be seen in Figure [sja) this image is asymptotically sparse in wavelets, 



and can therefore be recovered well using either a multilevel or power law subsampling scheme. 
Now suppose we perform the following experiment. We reverse the ordering of the first 1024 
wavelet coefficients (Figure [s^b)) in order to obtain a new function /, apply the same sampling 
patterns to recover / from its Fourier measurements. Having done this we once more reverse 
the order of the (reconstructed) wavelet coefficients so as to obtain a reconstruction of /. The 
result of this process is shown in Figure [sjb). As is evident, this gives as completely meaningless 
reconstruction. In particular, the same sampling pattern, the same total sparsity, but different a 
signal structure yields highly contrasting results. 



5.4 Third consequence: no Restricted Isometry Propery (RIP) 

The informed reader will have noticed that our main theorems do not require involve the Restricted 
Isometry Property (RIP); a standard technique in compressed sensing theory [25] [26l [27] . Instead, 
our results are based solely on the coherence (or asymptotic/local coherence) of the matrix U. In 
this sense, our work continues the development of the RIPless theory of compressed sensing, as 
introduced by Candes & Plan in [T2] . 

However, since the RIP-based techniques are well established, it is worth posing the following 
question: is the RIP relevant for imaging problems? That is to say, for realistic parameter 



15 



values (i.e. problem resolution, subsampling percentage, and sparsity) used in applications can 
one reasonably expect the RIP to hold? It is our belief that the answer to this is no. Had such 
matrices satisfied the RIP, one could permute the wavelet coefficients of an image and still recover 
it with the same accuracy. Figure [5] shows that this is not the case: it is clear that there is no 
RIP. The experiment in Figure [s] was repeated up to large subsampling percentage (50%) with 
many different images, but with the same outcome. 

In view of this, the third conclusion of our work is that the RIP is of limited value in analysing 
compressive imaging strategies. Coherence, on the other hand, is both a relevant and powerful 
tool for understanding recoverability in this setting. 

6 Proofs 

The proofs rely on some key propositions from which one can deduce the main theorems. The 
main work is to prove these proposition, and that will be done subsequently. 

6.1 Key results 

Proposition 6.1. Let U G B{P{N)) and suppose that A and ft = ftiU . . .Uftr (where the union 
is disjoint) are subsets ofN. Let xq e H and z G ran(PQ/7) be such that \\z\\ < 6 for ^ > 0. Let 
M G N and y = PqUxq + z and i/m = PqUPmXq + z. Suppose that ^ G H and (^m satisfiy 

mi.= mi{Mi.:\\PaUr,-y\\<5}. (6.1) 
UmU = inf {ll^lli^ : WPnUrj - VmW < S}. (6.2) 



If there exists a vector p = U*P^w such that 
(t) \\{P^U* {q^'Pn, ® . . . ® q-'Pn^ f/PA|p,(H))-^|| < | 

(i*) \\PaU* {qT'Pn, © . . . ® q-'Pn^) \\ < ^ 

(ii) ||PA/3-sgn(PAa;o)|| < f- 
(ill) \\Pip\\i^ < \ 

(iv) \\w\\ <L■^/\A\ 
then we have that 



U-xo\\<C-(^l+^y{5-{l + L-V~s) + WP^Xoy) , 
for some constant C . Also, if (Hi) is replaced by ||Pm^aP||^°° ^ \ then 

Um - xoII < C ■ (^1 + -^^ ■ ((5 ■ (1 + L ■ Vi) + WPuPk^oWn) ■ (6.3) 
Proof. Suppose that there exists a vector p, constructed with y^ = Pa^o, satisfying (i), (i*)-(iv). 



Let ^ be a solution to (6.1) and let h = ^-xq. Let Aa = Pa^* {Qi ^Pqi © ... © Qr^PQr-) UPA\p^{n)- 
Then, the following holds: 

\\PAh\\ = WA^'AAPAhW 

< \\AI^\\\\PaU* {q^'Pn, © ... © U{I - Pk)h\\ 

< IwPaU* {q^'Pn, © ... © q-'Pn.) \\ {\\P^Uh\\ + ||P^/i||) (6.4) 
<^^{25+\\Pkhh.), 



16 



where the second inequahty follows from (i) and the last inequality follows from (i*). We will now 
obtain a bound for ||P^/i||;i. In particular, 

\\h + xoWii = WPaIi + PaXoU + \\PA{h + a;o)||/i 

> Re(PA/i,sgn(PAa;o)) + \\Paxo\\ii + \\P^h\\ii - \\P^xo\\ii (6.5) 

> Re(PA/i,sgn(PAa;o)) + ||a;o||;i + WPA^h " '^WP^xoh 
Since ||a;o||;i > + a;o||;i , we have that 

WPkhh < \{PAh,sgn{PAXo))\+2\\Pixo\\ii. (6.6) 

We will use this equation later on in the proof, but before we do that observe that some basic 
adding and subtracting yields 

\{PAh,sgn{xo))\ < \{PAh,sgn{PAXo) - Pap)\ + \{h,p)\ + \{Pih,P^p)\ 

< ||PA/i||||sgn(PAa;o) - PapW + \{PnUh,w)\ + ||Pip|b~ 
<^\\PAh\\+2LSV~s+\\\P^hh^ (6.7) 

< ^ (2,5 + \\Pih\U^) + 2LSV-S + \ \\Pih\\i^ 

where the last inequahty utihses ( |6.4[ ) and the penultimate inequality follows from properties (ii), 
(iii) and (iv) of the dual vector p. Combining with (6.6) gives that 



\Pih\\l^<5['^-^ + BL^s]+^Ptx4l^. (6.8) 



Thus, (6.4) and (6.8) yields: 



\h\\<^^{25+\\PthU) + \\Pih\\i. 



□ 



The next two propositions give sufficient conditions for Proposition 6.1 to be true. But before 
we state them we need to define the following. 

Definition 6.2. Let U he an isometry of either C^x^ or B{P{N)). For N = (TVi, . . . , 7V^) G 
M = (Ml, ...,Mr) eW with I < Ni < . . . < Nr and I < Ml < . . . < Mr, s = (si, ...,Sr)eW 
and 1 < k < r, let 



/^N.M(fc,0 = max||PjJ-^[/P^;-r?||,~ • ^fi^P^^). 

where 

e = {v: WvWi- < 1, |supp(Pi^;-^7?)| = 5,, / = l, . . . , r}, 

and No = Mo = 0. 

Proposition 6.3. Let U G B{P(N)) be an isometry and x G /""^(N). Suppose that = r^N,m is 
a multilevel sampling scheme, where N = {Ni^ . . . , A/"^) G and m = (mi, . . . ,m^) G N^. Let 
(s,M), where M (Mi, . . . ,M^) G N'^; Mi < . . . < Mr, and s (si, ... ,5^) G W , he any pair 
such that the following holds: 



(i) The parameters 

N'=Nr, K := max " " 



k-l 1 



/c=i,...,r [ rrik 

satisfy the weak balancing property with respect to U , M := Mr and s := Si 



17 



(ii) for e > and 1 < k < r, 

1 > {\og{se-') + 1) • ~ • {ukA + . . . + i^k^r) • log (KM^s) , (6.10) 

Uk,i = min(/iN,M(A^, • si, /^n,m(A^, 0); 
mfc > (log(56-^) + 1) • mfc • log (i^Mv^) , (6.11) 

where rhk satisfies 

1 > 1 • /iATo • Si + . . . + 1 • llNr-i • ^r, 

V mi y V y 

si + . . . + < si + . . . + Sr, Sk < Sk{si 
and Sk is defined in U-^- 



, . . . , Or J-, 



Then (i), )-(iv) in Proposition \6.1\ follow with probability exceeding 1 — e, with (Hi) is replaced 
by \\PmP^p\\i^ < I cind L in (iv) is given by 



U = — ^k-i for all 1 < k < r then (i), (i* )-(iv) follow with probability one. 

Proposition 6.4. Let U G S(/^(N)) be an isometry and x G /"^(N). Suppose that Q = l^N,m is 
a multilevel sampling scheme, where N = (A/'i, . . . ^N^) G and m = (mi, . . . ,m^) G N^. Let 
(s,M), where M (Mi, . . . ,M^) e W , Mi < . . . < Mr, and s = (si, . . . ,s^) G W, be any pair 
such that the following holds: 



(i) The parameters N and K (as in Proposition 6.3) satisfy the strong balancing property with 
respect to U, M = Mr and s := si -\- . . . -\- Sr', 

(ii) for e > and 1 < k <r, 

1 > (log(5e-i) + 1) • ~ ■ {uk,! + . . . + ^fc,r) • log (kM^) , (6.13) 



where v^^i is as in Proposition 6.3: 

(Hi) 

rrik > (log(5e-i) + 1) • m^ • log (i^Mv^) , (6.14) 
where M = mm{i G N : ||PAr/7i^-ii || < l/{K32^/s)} 



Then (i), (i^)-(iv) in Proposition 6.1 follow with probability exceeding 1 — e with L as in (6.12). 
If rrik = ^k — Nk-i for all 1 < k < r then (i), (i* )-(iv) follow with probability one. 

Lemma 6.5 (Bounds for /^n,m(^, ^k)* For /c = 1, . . . , r — 1^ if 

then 



ana 



^n,m(A:, /) < 7 • J^Ji{Pk,_U), I < k (6.15) 



Sk+i < 2 [sk+i + . . . + s^) 



18 



Proof. Let \\P^ UPMkW = 7/c and note that 



/^n,m(^,0 < • V^i H ^si- Jii{P^^_U), I < k 



and 



<max(||P4/7PM,r/|| 



□ 



We are now ready to prove the main theorems. 



Proof of Theorems 4-2 and 4-4 • It is straightforward to show that Theorem 4.2 follows from The- 



orem |44j thus we will concentrate on proving the latter. Note that Proposition |6.3| applied to a 
two-level sampling scheme Q = r^N,m5 where N = {Ni^N2) and m = (mi, 7712) with mi = Mi 
and m2 = m, and x G which is (s, M)-sparse, where r = 2, s = (Mi,S2), M = (Mi,M2), 

and A^i, A^2, ^1, ^2 ^ N are such that 



N = N2, K 



N2-N: 



1712 



mi J 



satisfy the weak balancing property with respect to [/, M = M2 and s = Mi + 52, yield the 
following: (i), (i*)-(iv) in Proposition |6. 1 1 follow with probability exceeding 1 — e, with (iii) replaced 
by WPmPaPWi^ < ^ and L in (iv) is given by Kl^, if 



1 > (log(^e-i) 



1) {U2,l 

1712 



i/2,2)-log {KM^), 



where rhk satisfies 



m2 > (log(se"^) + 1) • m2 • log {KMy/s) 

' Nr - Nr-l 



(6.16) 
(6.17) 



1 > 



rrir 



1 • MATi • S2, 



where 52 < *S'2(5i,52), and Sk is defined in (4.6). But via the assumption that \\ P^^ UP mi \\ < 



7/VM1 for some 7 < 2/5 and Lemma 6.5 we observe that (4.4) implies (6.16) and (6.17). Thus, 



the theorem now follows from Proposition 6.1 



□ 



The proof of Theorem |4.5| is almost verbatim the proof of Theorem |4.4| with the exception that 
all references to Proposition |6.3| is replaced by Proposition |6.4[ We omit the details. 



Proof of Theorem\4^ and Theorem \4-S\ It is straightforward that Theorem |4 . 7| follows from The- 
orem |4.8| and a direct application of Proposition |6.4| and Proposition |6.1| completes the proof. □ 



What is left is now to prove Propositions 6.3 and|6.4[ and that will be done in the next sections. 



6.2 Preliminaries 

Before we start on the rather long way to prover the propositions, let us recall one of the monu- 
mental theorems in probability theory that will come in handy later on. 

Theorem 6.6. (Talagrand I43j) There exists a number K with the following property. Consider n 
independent random variables Xi valued in a measurable space Q. Consider a (countable) class T 
of measurable functions on Vt. Consider the random variable Z = supj^jr X]i<n fi^i)- Consider 

5=sup||/||oo, F=supE I 



19 



//E(/(X^)) = for all f G T and i <n, then, for each t>{), we have 
P(|Z-E(Z)|>i)<3expf-14lo/^ 



KS ''V V + S¥.{Z) 



where Z = supf^jr \ T.i<n fi^i)\- 



Equipped with this sledge hammer we can now get to work. We wih present the fohowing 
theorem as weh as two technical propositions that will serve as the main tools in the proofs of the 
main propositions. A crucial tool in the proofs of the propositions is the Bernoulli sampling model. 
We will use the notation {a, . . . , 6} D ~ Ber((7), where a < 6 a, 6 G N, when Q = {k : 5k = 1} 
and {Sk}k=i is a sequence of Bernoulli variables with ¥{6^) = I = q 

Definition 6.7. Let r e N, N = {Ni, . . . , Nr) e W with 1 < Ni < . . . < Nr, m = (mi, . . . , m^) G 
W , with ruk < Nk — N^-i, /c = 1, . . . , and suppose that 

C {Nk-i + 1, . . . , Nk}, - Ber ( , _ ) , ^ = 1, . . . , r, 

where A^o = 0. We refer to the set 

1^ = Yn '-— ^1 U . . . U 1^7-. 

as an (N^m) -multilevel Bernoulli sampling scheme. 

Theorem 6.8. Let U G B{P{N)) be an isometry. Suppose that ft = r^N,m '^s a multilevel Bernoulli 
sampling scheme, where N = {Ni, . . . , Nr) G and m = (mi, . . . , m^) G N^. Consider (s, M)^ 
where M = (Mi, . . . , Mr) eW, Mi < ... < Mr, and s = (si, . . . , s^) G W, and let 

A = Ai U . . . U A^, Afc c {Mfc_i, . . . , Mfc}, |Afc| = sj, 

where Mq = 0. // \\PmU*PnUPm^ - PmA < 1/8 then, for 7 G (0, 1), 

nil^At/*(^r^Pn, © ... © q-^Pn^UP^ - ^aII > 1/4) < 7, (6.18) 

where q^ = mk/{Nk — A^/c-i), provided that 

1 ^ ^^^^ — • (^fe,i + . . . + J^k,r) • (log (7"^ + 1) ' ^k,i = min(/iN,M(^, • /^n,m(^, 0)- 

m/e ' \ \ / / ' 

(6.19) 

In addition, if q = 1, then 

nil^A/7*(gr^P^, © ... © q;'Pn^)UPA -Pa\\> 1/4) = 0. 

To prove this theorem we deliberately avoid the use of the Matrix Bernstein inequality [28] , 
The reason is that this inequality is dimension dependent and hence yields dimension dependent 
estimates if used to prove the main theorems (see the dimension dependent bounds on the proba- 
bility in [I2J for example). As we are proving theorems for infinite-dimensional problems it is most 
unnatural to use dimension dependent result, and we therefore rely on Talagrand's dimension free 
theorem. However, before we can prove our theorem, we need a technical lemma. 



Lemma 6.9. Let U G B{1 (N)) with \\U\\ < 1, and consider the setup in Theorem 6.18. Let 
{Sj}jLi be random independent Bernoulli variables with F{dj = 1) = g^, qj = mk/{Nk — N^-i) 

and j G {A'^^-i + 1, . . . , A'fe}, cind define Z = J^_^i Zj^ Zj = {qJ^Sj — l) r]j 0f]j and r]j = PaU^cj. 
Then 

E(||Z||)^<241og(|A|) ma^^{q-%,f}, 

when 

\ogi\A\)-'>9ma^^{q-'\\Vjf}- 



20 



The proof of this lemma is essentiahy reworking a result by Rudelson [40^, but we need to 
include it here as the setup deviates from similar results in p!3l and [2 . 

Lemma 6.10. (Rudelson) Let r^i, . . . , r]M ^ ctnd let ei^ . . . £m be independent Bernoulli vari- 
ables taking values 1,-1 with probability 1/2. Then 



E 



M 



< -\/log(n)max||77i 

Z i<M 



\ 



M 



Lemma 6.10 is often referred to as Rudelson's Lemma ^40^, however, we are relying on a 
complex version that was proven by Tropp ^j, where also the constant 3/2 was established. 



Proof of Lemma \6^ We we start by observing that by letting S = {Sj}jLi be independent copies 
of 5 = {Sj}f^^. Then, since E(Z) = 0, 



\Z\\) = 



N 



(6.20) 



by Jensen's inequality. Let e = {ej}jLi be a sequence of Bernoulli variables taking values ±1 
with probability 1/2. Then, by ( |6.20[ ), symmetry, Fubini's Theorem and the triangle inequality, 
it follows that 



E.(||^||)<: 



< 2E5 



E^ E^~ 



N 



N 



(6.21) 



Note that the setup in (6.21) is now ready for the use of Rudelson's Lemma (Lemma 6.10). 
However, as specified before, it is the complex version that is crucial here. Now, by Lemma [6.10 
we get that 



E. 



N 



< - V'log(s) max q~. 
2 i<i<Ar 



■1/2, 



Vj\ 



\ 



N 



(6.22) 



And hence, by using (6.21) and (6.22), it follows that 



|Z||)<3yi^ max fl-^/'||r?,-| 

1<J<N ^ 



\ 





N 






Z^"^r]j^ f]j 


) 









Thus, by using the easy calculus fact that if r > 0, c < 1 and r < c\^r + 1 then we have that 
r < c(l + ^/5)/2^ and the fact that U is an isometry (so that || XljLi Vj ^ ^jll ^ 1)^ is easy to 
see that the claim follows. □ 



We are now ready to prove Theorem 6.8 



Proof of Theorem \6.8\ Let {Sj}jL-^ be random Bernoulli variables as defined in Lemma 
define Z = Xl^i — {qJ^^j — f) Vj Vj with r]j = P/\U^ej. Now observe that 



6.9 



and 



N N 

PAU*{q^^Pn, e . . . e q-^Pn.)UPA = ^ qJ^SjTjj 7),, P^U^'PnUP^ = Y.Vj ^ Vj- (6.23) 



21 



Thus, it follows that 

\\PAU*{q^'Pn, ® . . .®qT'Pn,)UPA - Pa\\ < \\Z\\ + \\{PaU*PnUPa - Pa)\\ 

1 (6.24) 
< 11^11 + 8' 

by the assumed weak balancing property. Thus, to prove the assertion we need to estimate ||Z||, 
and Talagrand's Theorem |6.6| will be our main tool. Note that clearly, since Z is self-adjoint, we 
have that \\Z\\ = sup^^^ K^C? 01^ where ^ is a countable set of vectors in the unit ball of Pa{1-L) 
. Define, for G ^, the mappings 

Ci(T) = {rc,C), C2(r) = -{rc,C), TeB{n). 

In order to use Talagrand's Theorem |6.6| we restrict the domain V of the mappings Q to 

V = {T(,Bm:\\T\\< m^ {q-'\\r,,r}]. 

1<J<N 

Let denote the family of mappings Ci, (2 for C ^ G- Then \\Z\\ = sup^^^-^ ({Z)^ and we have 

|Ci(^i)| = \iq-'5^ - 1)1 K(r?i ® %)C,C>I < m^^{q-'Hf}- 
Thus, Zj G V for 1 < j < N and S := sup^^^jr ||C||oo = niaxi<j<Ar{^~"'^ Note that 

r 

q-'Hf = q-'{PAU*ej,PAU*e,) = qj' J2{PA,U*e„ PA,U*e,) . 

k=l 

Also, note that an easy application of Holder's inequality gives the following (note that the and 
bounds are finite because all the projections have finite rank), 

\{PA,U*e„PA,U*ej)\ < ||PA.C/*e,||,i||PA.f/*e,-||;» 

< \\PA,u*p;^;-'y^i4PA,u*ej\\i^ < \\p;:^;-'upaJi^^i^ • y^M^^.^f^) = «N,Ma,fc), 

for j G + 1, . . . , Ni} and / G {1, . . . , r}. And finally observe that 

|(PA,/7*e„PA,t/*e,)| < ^{P^l-'UP^l-')-Sk < /iN,MG, ^) • ^fe. 
Hence, it follows that 

\\r]j\\^ < max {vk,i + • • • + z^/c,r), T^k.i = min(/iN,M(A:, • 5/, /^n,m(/^^, 0) (6-25) 

l<k<r 



and therefore S < maxi</e<^ q^^ ^{^k,i + • • • + i^k,r)- Finally, note that by (6.25) and the reasoning 
above, it follows that 

V:= supE ) < supE -l)'|(PAt/*e„C)|' 

< max llr^fcll 1 sup > |(e^-, [/PaC)I , 

^<k<r \ rrik J ^,eT^^ 

< max + . . . + i^/c,r) = max (i^/c,l + • • • + ^/c,r), 

i<k<r rrik i<k<r rrik 



where we used the fact that U is an Isometry to deduce that \\U\\ = 1. Also, by Lemma 6.9 and 
(6.25) , it follows that 

E(||Z||)'<24 max ~ ^^"^ + . . . + i^k^r) ■ log{s) (6.27) 
i<k<r rrik 



22 



when 



1 > 9 max [UkA + • • • + i^k,r) • log(s). 



(6.28) 



Thus, by (6.24) and Talagrand's Theorem 6.6, it fohows that 

F {\\PAU*{q^^Pn, ® . . . ® g-^PoJC/PA - Pa|| > 1/4) 
1 



< 



Z > 



< i^exp 



16 
1 



24 max ^ — + • • • + J^k,r) • log(s) 
i<fe<r rrik 



max 



Nk - Nk-i 



16K \i<k<r rrik 



+ . . . + ^fc,,) log (1 + 1/32) , 



(6.29) 



when m/e's are chosen such that the right hand side of (6.27) is less than or equal to j^. This 
yields the first part of the theorem. The second claim of this theorem follows from the Balancing 
property. □ 

Proposition 6.11. Let U G S(/^(N)) he an isometry. Suppose that Vt = r^N,m 'is a multilevel 
Bernoulli sampling scheme, where N = {Ni^ . . . , A^^) G and m = (mi, . . . , m^) G N^. Consider 
(s, M), where M (Mi, . . . , Mr) G N'', Mi < . . . < Mr, and s = (si, . . . , 5^) G W, and let 

A = Ai U . . . U A^, A/, c {M/,_i, . . . , Mfc}, |A/,| = 5/, 

where Mq = 0. Let p > 1/4. 



N := TV^, := max 

/c=l,...,r 



/c-l 1 



satisfy the weak balancing property with respect to U, M := Mr and s := si 
for ^ G H and t^j>0, we have that 



provided that 



l0g(^(M-5) 

/or some constant C > {), where 



>C A, 



log(i(M-.)) 



> C T, 



-5^; then, 
(6.30) 

(6.31) 



A = max j — — ^^^^ • {uk,! + . . . + i^k^r) } , J^k,i = min(/iN,M(^, ' ^n,m(^, 0)- 

l<k<r ' 



keg 



ruk 



where Q = {k : ruk < N}^ — A^/c_i, /c = 1, . . . , r} and 



T = 



A/'i-A^Q 
mi 



1 • /iATo • 51 + . . . 



- Nr-l 

rrir 



1 • I^Nr-i • ^r, 



(6.32) 
(6.33) 



for some {sk}l.^i such that 

5i + . . . + 5r < Si 



Sk < Sk{si, . . . ,Sr). 



Moreover, if = 1 for all k = 1, . . . ^r, then (6.31) is trivially satisfied for any 7 > and 
the left-hand side of (6.3C^ is equal to zero. 

(a) If N satisfies the strong Balancing Property with respect to [/, M and s, then, for £^ G H and 
t, 7 > 0; we have that 



\PtU*{q^^Pn, ® . . . ® q;^Pn^UPU\\i^ > /3||^|h=.) < 7, 



(6.34) 



23 



provided that 



/3 



>C A, 



> C T, 



(6.35) 



log(i(0~-s)) 

for some constant C > 0, = ^({^/c}fc=i, 1/8, {A^/c}^=i, s, M) and T, A as defined in (i) and 
0{{qk}U,.t,{N,}U,,s,M) 

zeN: max \\Pr,U%q^' Pr,^, ^ . . . ^ q'' Pr,^^)Ue,\\ > ^ 

ric{i,..-,M}, |ri|=s ^ys 

r2,jC{Nj_^ + l,...,Nj}, j = l,...,r 



Moreover, if q^ = I for all k = 1^ . . . ^r, then (6.35) is trivially satisfied for any 7 > and 
the left-hand side of (6.34) ^-^ equal to zero. 

Proof. To prove (i) we note that, without loss of generahty, we can assume that = 1. Let 

{Sj}jLi be random Bernouhi variables with F{Sj = 1) = qj = q^^ for j G {A^/c-i + 1, . . . , A/'/e} and 
1 < k < r. A key observation that will be crucial below is that 

N 

PiU'^iqi'Pn, © ... © q;:'Pn.)UPAi = ^PiW^q-'Sjicj e,)t/PA^ 

N 

= ^AU^q-'Sj - l)(e, e,)/7PA^ + PtU'' PnUP^^^. 

(6.36) 



We will use this equation at the end of the argument, but first we will estimate the size of the 
individual components 
the random variables 



individual components of XljLi ^^3 ~ ©ej)/7PA^. To do that define, for 1 < j < N ^ 



and the set 



X] = {U-'iq-^Sj - l)(e, 0e,)/7PAe,e,), i e A' 



g = {j:qj<lJ = l,...,N}. 



Observe that Xj = whenever j ^ Q. 

We will show using Bernstein's inequality that, for each z G and t > 0, 



AT 



> t < 4 exp 



T + At/3 J ' 

To prove the claim, we need to estimate E and First note that, 

E(|Xj|2) = (9-1 - l)Ke„C/PAC)n(e„C/e,)|2 
and recall that |(ej, /7ei)p < /iAr^ for j G {A'/c-i + 1, . . . , A/'^}. Hence, it follows that 

f^E (|xji^) < ^(g-^ - i)/i^,_j|p^r^[/PA^ii^ < sup (^(g-^ - wp^i'^a 



(6.37) 



= {C : Ili^MrXII < V^, 1 < fc < r}. 



Observe that, clearly, the supremum in the above bound is attained for some C ^ © ^iid let 
= WPn^'^UCW^. Hence, we have 



N 



^E(|Xjp)<^(C^-l)/i^,_,5,. 



(6.38) 



24 



Note that it is clear from the definition that Sk < Sk{si^ . . . ^ Sr) for 1 < /c < r. Also, observe that 
from the fact that \\U\\ < 1 and the definition of B, it follows that 

5l + . . . + 5. = E ll^'-^f^^'ACf < WUPACf = llCf <S, + ... + Sr. 

To estimate we start by observing that, by the triangle inequality, the fact that = 1 

and Holder's inequality, it follows that 

|(?,PAf/%)|<EK<r^'^AC/*e,)|, 



and 



\{PM^-'t PAU*ej)\ < \\P^I-'UPa, 11(00^(00, j e {iV(_i + l,...,Ni}, le{l,..., r}, 
\{P;^'^-'^,PAU*ej)\ < AiN,Ma,fc) • Sfe. 



Hence, it follows that for 1 < j < A'' and i G A'^, 
'Nk-Nk 



\X;\ = q-'\{5, - qMi,PAU*ej)\\{e„Ue,)\, 



< max 

l<k<r 
keQ 



ruk 



- • (min{/iN,M(^, 1) • si, a>:n,m(^, !)} + •• • + m,M(A:, r) • s^, /^n,m(A:, r)})| . 

(6.39) 



Now, clearly E(Xj) = for 1 < j < and i G A^. Thus, by applying Bernstein's inequality to 
Re(Xj) and Im(X j) for j = 1, . . . , A/", via ( |6.38| and ( |6.39| , the claim ( |6.37| follows. 

Now, by (6.37), (6.36) and the assumed weak Balancmg property (wBF), it follows that 

V{\\PMP^U*{q^'Pn,®...®q-^PnJUPAai^>/3) 

>/3 



iGAcn{l,...,M} 
iGAcn{l,...,M} 



AT 



N 



>/3-||PMPit/*P^/7PA| 



<4(M-8)exp (- ^^^^^^3 ). i=\p^ by(6.37),(wBP), 



Also, 



when 



tV4 \ 



<7 



log(-(M-s)) > 



4T 

1^^ 3^y 



And this concludes the proof of (i). To prove (ii), for t > 0, suppose that there is a set At c N 
such that 



(^sup \{PkU*{q^^Pn, ® . . . © q-^PnJUPAV, e,)! >t^=0, \At 



< 00. 



Then, as before, by (6.37), (6.36) and the assumed strong Balancing property (sBP), it follows 



25 



that 



AT 



i=i 



AT 

E^] 

i=i 



> p-\\PiU*PNUPA\ 



<4(|A^|-5)exp 



T + At/3 y 



A by(6.37),(sBP), 



< 7 



whenever 



Hence, it remains to obtain a bound on |A^|. Let 



0{qi,...,qr,t,s) = < 



ieN: max Pr,U%q7^Pr,, 

ric{i,...,M}, |ri|=s 
r2,,c{Ar,_i+i,...,Ar,}, j=i,...,r 



^Qr'Pr,,.)Ue,\\> 



Clearly, C . . . ^Qr^t^ s) and 



||Pr,C/*(c'Pr,,, ©...®g-iPr.,J/7ei|| < max q-'\\PNUPti\\ ^ 

l<j<r 

as i ^ oo. So, |6>(gi, . . . , g^, t, 5)1 < 00. Furthermore, since 6{{qk}l.^i,t, {Nk}l.^i, M) is a 
decreasing function in for alH > |, 

\0{qu ...^qr.t, s)\ < 0{{qk}U,, 1/8, {Nk}U,,s, M) 

thus, we have proved (ii). The statements at the end of (i) and (ii) are clear from the reasoning 
above. □ 



Proposition 6.12. Consider the same setup as in Proposition \6.11\ If N and K satisfy the weak 
Balancing Property with respect to [/, M and s, then, for ^ H and t, 7 > 0^ we have 



¥{\\PAU*{q^'Pn,®...®q-'Pn^)UPA-PAm^ >aU\\i^)<J, 
a= (21og2/^ {4^KM)y\ 

provided that 

1 > A • (log (S7-1) + 1) • log (V^KM) , 
1 > T • (log (S7-1) + 1) • log (y^KM) , 

where A and T are defined in ( 6.. 3^ and (6.3!^ . Also, 



(6.40) 



¥{\\PAU*{q^'Pn, 



.q-iPoJ[/PA-PA)eil/~ > 2lieil;~) <7 



(6.41) 



provided that 

1 > A ■ (log (57-^) + 1) , 1 > T ■ (log (S7-I) + 1) . 
Moreover, if qk = I for all k = 1^ . . . ^r, then the left-hand sides of 1^6.40^ and {6.4I) are equal to 



zero. 



26 



Proof. Without loss of generality we may assume that W^Wi^^ = 1. Let {Sj}jLi be random Bernoulli 
= 1) = with j e {Nk-i + 1, . . . , Nf^} and 1 < A: < r. Let also, for j G N, 



variables with 
r]j = ([/PA)*ej. Then, after observing that 

N N 

PAU^q^^Pn, e . . . © q-'Pn^)UPA = q-'SjTjj f]j, PaW'PnUPa = ^ ^ 
it follows immediately that 

TV 

PAU^q^'Pn, © ... © g-'PQ Jf/PA - Pa = ^(^7'^.- - 1)^.- © % - {PaU^'PnUPa - Pa)- (6.42) 



As in the proof of Proposition |6.11| Our goal is to eventually use Bernstein's inequality and the 
following is therefore a setup for that. Define, for 1 < j < the random variables 



= {{qJ^Sj - 1)(7?, ©%)^,e,), i e A. 



We claim that, for t > 0, 



N 



> t < 4 exp 



T + At/3y 



i G A. 



(6.43) 



Now, clearly E(Zj) = 0, so we may use Bernstein's inequality. Thus, we need to estimate E (|^jP) 
and We wih start with E (|Zjp). Note that 

E(|Zj|^) = (g-^-l)|(e„/7PA0l'l(e„/7e,)|^ (6.44) 
Thus, we can argue exactly as in the proof of Proposition |6. 1 1 1 and deduce that 



N 



^E(|Zjp)<^(g-i-l)/.^,_,.~,, 



(6.45) 



k=l 



where Sk ^ Sk{si^ . . . ^ Sr) for 1 < k < r and 5i + . . . + 5^ < si + . . . + 5^. 
To estimate | Zj | we argue as in the proof of Proposition 



6.11 



and obtain 



\Z'\ < max 

l<k<r 



ruk 



(min{/iN,M(/^, 1) • 51, /^n,m(/^, 1)} + . . . + /iN,M(/^, r) ■ 5^, /^n,m(/^, r)}) 

(6.46) 



} 



Thus, by applying Bernstein's inequality to Re(Z|), . . . , Re(ZJ^) and Im(Z{), . . . , Im(Z]y) we ob- 
tain, via (6.45) and (6.46) the estimate ( 6.43| ), and we have proved the claim. 

Now armed with (6.43) we can deduce that , by ( |6.36[ ) and the assumed weak Balancing 
property (wBP), it follows that 

P {\\PAU%q^'Pn, © ... © q^^Pn^UPA - PA)i\\i- > a) 



ieA 

ieA 
< 4 5 exp 



N 



^ Z} + {{PaU^'PnUPa - Pa)^, e,) 



> a 



N 

E^; 



(6.47) 



>a- WPmU'^PnUPj 



Also, 



T + At/3y 



t = a, by(6.43),(wBP). 



<7, 



(6.48) 



27 



when 



1 > 



4T 4 \ , /4s 
7^ + 3t^j-^°HT 



And this gives the first part of the proposition. Also, the fact that the left hand side of (6.40) is 
zero when g/^ = 1 for 1 < A: < r is clear from (6.48). Note that (ii) follows by arguing exactly as 
above and replacing a by |. 

□ 



6.3 Proofs of Propositions |6.3| and |6.4 



The proof of the propositions relies on an idea that originated in a paper by D. Gross [28^, namely, 
the golfing scheme. The variant we are using here is based on an idea from [2]. However, the 
informed reader will recognise that the setup here differs substantially from both [28 and [2 . See 
also [12 for other examples of the use of the golfing scheme. 



Proof of Proposition 6.3. We start by mentioning that converting from the Bernoulli sampling 
model and uniform sampling model has become standard in the literature. In particular, one can 
do this by showing that the Bernoulli model implies (up to a constant) the uniform sampling 
model in each of the conditions in Proposition 6.1 This is straightforward and the reader may 
consult [m [131 El] for details. We will therefore consider (without loss of generality) only the 
multi- level Bernoulli sampling scheme. 

Recall that we are using the following Bernoulli sampling model: Given A^o = 0, A^i, . . . , A^r ^ N 
we let 

777/ 

{Nk-1 + l,...,A^/c} D Q^k --Ber((7/e), qu = — . 

iMk — ^V/c_i 

Note that we may replace this Bernoulli sampling model with the following equivalent sampling 
model (see [2 ): 



for some G N with 



U U • • • U QPj^ - Ber(g^), l<k<r, 



(6.49) 



The latter model is the one we will use throughout the proof and the specific value of u will be 
chosen later. Note also that because of overlaps we will have 



(Ik 



Qk 



■ql>qk, l<k<r. 



(6.50) 

)Pj7^)) that satisfies 



The strategy of the proof is to show the existence of a p G ran([/*(P^^ . 
(i), (i*)-(iv) in Proposition 6.1 with probability exceeding 1 — e. 

Step I: The construction of p: We start by defining 7 = e/6 (the reason for this particular 
choice will become clear later). We also define a number of quantities (and the reason for these 
choices will become clear later in the proof): 



as well as 



by 



u = 4riog(7-i) + v], V = \\og^{SKM^s)-\, 
{<i:l<k<r,l<i<u), {aJlLi, {ftlti 



(6.51) 



Ik 



7Qk, 



Qk 



Qk 



Qk, 



Qk 



{Nk-Nk-i)ml\ l<k< 



(6.52) 



with 
and 



(l-^^)(l-^^)---(l-^^) = (l-^,) 



ai = ^2 = (21og2/^(4A:Mvs))"\ = 1/2, ?><i<u, 



(6.53) 



28 



as well as 



Pl=p2 = \. P^ = \ log2(4i^Mv/5), ?><1<U. 



(6.54) 



Consider now the following construction of p. We will define recursively the sequences {^i}-Lo ^ 
{l^lLi C n and {cjj^^o C N as follows: first let ujq = {0}, uJi = {0, 1} and = {0, 1,2}. 
Then define recursively, for i > 3, the following: 



fu;,_iU{i} if||(PA-PAC/*(iPoi 
and||PMPiC/*(^Pni^ 



iPo*)f/PA^i-l||(~ </?i||^i-l||(=», 



1^, 



otherwise. 



(6.55) 



i > 1, 



sgn(xo) - PaYz if^ e cji, 
otherwise. 



z > 1 



otherwise, 

Zo = sgn(xo). 



Now, let {A^ll^^ and {Bi}^^-^ denote the following events 



Ai-. ||(Pa-C/*(-Poi©- 

B,: ||PMPit/*(^P^. ©. 

B^: ||Pa/7*(-P^, ©...^ 

Ba : |c^n| > ^, 

P5: (ntiA,)n(ntiP,). 



1 

(7^ 



11° 



<a,\\Z,_i\ 



P^.)/7PaZ,_i||,^ <ft||Z,_i||,^, 



z = l,2, 
i = l,2, 



-P^7jt/PA-PA||<l/4, 



(6.56) 



Also, let r{j) denote the j*^ element in Uu (e.g. r(0) = 0,r(l) = l,r(2) = 2 etc.) and finally 
define p by 

{y^-^^) if P5 occurs, 
otherwise. 

Note that, clearly, p G ran([/*P^), and we just need to show that when the event P5 occurs, then 
(i), (i*)-(iv) in Proposition |6 . 1 1 will follow. 

Step II: P5 ^ (i), (i*). To see that the assertion is true, note that if P5 occurs then P3 occurs, 
which immediately implies that ||(Pa/7* (gf^Pj^i © • • • © Qr^Pftr^) ^^a|paCH))"^ II < yielding 
(i). As for (i*), note that if P3 occurs, then it follows that 



|Pa/7* {q^^Pn, © ... © q-^Pn.) UPa\\ < 



(6.57) 



Also, observe that 



\\PaU* {q^'Pn, ® . . . © q-'Pn^) f = \\ {q^'Pn, © ... © Qr^Pn^) UPAf 

= sup \\{qi'Pn,(B...(Bq;'PQ^)UPAvf 
\M\=i 

= sup hk'Pn.UPAvf < I sup ^</,-iPn,C/PAr?f , ^ = max {^} 

= - sup (Pa/7* {y2lk'PnAuPAV,v) < -\\PaU* {q^' Pq, (B . . . ® q-^Pn^) UPa\ 
9|NI=i / 



(6.58) 



Thus, (6.57) and (6.58) imply \\PaU* (C^^fii © ... © q^^P^r) II < and we have shown (i*). 



29 



Step III: ^ (ii), (iii). To show the assertion, we start by making the fohowing observa- 
tions: By the construction of Z^(^) and the fact that Zq = sgn(xo), it fohows that 



1 



(1) 



Zr{i-l)-PAU*{—^P^.(.) 



-^P^ra,)UPA)Zo 



1 



qr 



1l 



so we immediately get that 



Hence, if the event occurs, we have, by the choices in ( |6.53[ ) and (6.54) 



||p-Sgn(xo)|| = ||^r(^)|| < Vs\\Zr(v)\\l^ < ar(i) 

i=l 

since we have chosen v = [log2(8i^My^)] . Also, 



<v^<J-, 
- 2" - 8K' 



(6.59) 



||PMPip||;~ <^||PMPiC/*(^P^.(.) ©...©^P^.(.))C/PA^.(i-i)||;o 



<li 

V V i—1 

i=l j=l 

1 ^ log2{a) I ^ 

2 



(6.60) 



i=l 



4' 21ogp(a) 23log2(a) 



^)<^, a = 8KV~s. 



In particular, (6.59) and ( |6.60 ) imply (ii) and (iii) in Proposition 6.1 



Step IV: B^ =^ (iv). To show that, note that we may write the already constructed p as 
p = U^Pqw where 



w 



E 



1 



Wi, Wi 



.11 



1 



To estimate ||k;|| we simply compute 



11^.11' 



1 



1 



.^1 



r(i) 



1 



r(i) n, 



-(i) 



UPaZ, 



(i-i) 



(O^^r(i-l) 



and then use the assumption that the event B^ holds to deduce that 

V ^/c y \fe=l ^/c / 



max 

l</c<r 



< max 

l<k<r 



^ (||Z.(,_i)||||Z.(,)|| + ||Z.(,_i)|p) 



l</c<r I ^'^C*) 



30 



where the last inequahty fohows from the assumption that the event holds. Hence 

-1 



l^ll < V^y^ f max < 1. — > ^/ai + 1 TT a 

^ \ l<k<r / r(i) -"--^ ■ 



(6.61) 



Note that, due to the fact that ql -\- . . . -\- q]^ > q^, we have that 

ruk 1 



Qk > 



4(7V, - Nk-i) 4 riog(7-i) + [log2(8KMv^)ll -2' 



This gives, in combination with the chosen values of {aj} and (6.61) that 

1 



ll^ll < 2V5 max 

l<k<r 



rrik 



3/2 



21og2^^ (4i^Mv^)^ 



\/s max 

l<k<r 



< 2a/s max 

l<k<r 



Nk-Nk-i [3 Vriog(7-') + riog2(8i^Mv^)ll - 2 ^ 1 



log2 (4KMv^) 



i=3 



rrik 



3/2 



3y/' A , log2i7:^\ 
Y log2(4i^Mv^) J 



(6.62) 



/- / 7Vfe-iV,_i 3 , A , log2(7-^) + M 
V65 max W - + Wl + ^/^ . 

l</c<r V m/e V V log2(4i^^V^)/ 



Step V: The weak balancing property, ([eTTo]) and ([eTTT]) ^ P(A^ U A§ U5f U5^U5§) < 

57. 

To see this, note that by Proposition 6.12 we immediately get that P(A^) < 7 and ^(^2) < 7 
as long as the weak balancing property and 



1 > A • (log (57"') + 1) • log (v^i^M) , 
1 > T . (log (57-I) + 1) • log {^KM) , 



(6.63) 



are satisfied, where K = maxi</c<^(A^fe — Nk-i)/mk 
'Nk-Nk-i 



A = max 

l<k<r 

and 



rrik 



mi 



1 • Mato • ^1 



Vk^i = min(/iN,M(A^, • ^n,m(A^, 0)^ (6-64) 

Nr - Nr-1 



rrir 



1 • llNr-i • ^r, 



(6.65) 



where 5i + . . . + < 5i + . . . + 5^ and Sk < Sk{si^ . . . ,5^). However, clearly, (6.10) and (6.11) 
imply (6.63). Also, Proposition 6.11 yields that P(5f) < 7 and ^(^f) ^ 7 as long as the weak 
balancing property and 



1 > A • log -(M - 5) , 1 > T • log -(M - 5) 



(6.66) 

are satisfied. However, again, ( |6.1Q ) and (6.11) imply (6.66). Finally, to bound ¥{B^)^ we use 
Proposition 6.8 to deduce that ^(^3) < 7 when the weak balancing property and 



l>A-(log (7"'5)+l) 



(6.67) 



is satisfied. But (6.10) implies (6.67), and we are done. 



Step VI: The weak balancing property, ( |6.10D and ( |6.1lD P(5|) < 7. 
To see this, define the random variables Xi, . . . Xu-2 by 



7^ 

1 Wj^2=Wj^i. 



(6.68) 



31 



We immediately observe that 

¥{Bl) = F{\uju \<v)= P(Xi + . . . + Xu-2 >u-v). 
Note that if we can provide a bound 

1 



>P(X, = 1), 



j = l,...,ix-2, 



(6.69) 



(6.70) 



then, by the standard Chernoff bound ([36, Theorem 2.1]), it fohows that, for t > 0, 

P(Xi + ...+X,_2 > {u-2){t^l/2))<e-^^^-^^'\ (6.71) 

Hence, if we let t = {u — v)/{u — 2) — 1/2 and plug in the values of u and v from (6.51 ) into (6.71 ), 
then we get that 

F{Bl) = P(Xi + . . . + Xu-2 >u-v)<-f. 

Thus, the proposition follows from the next claim. 

Claim: The weak balancing property, (6.10) and (6.11) (6.70). To prove the claim 
we first observe that Xj — when 



||(Pa-Pa/7*(-Po. 

^1 



Qr ^ 



z = j + 2, 



where we recall from (6.52) that 



l<k<r. 



Thus, by choosing 7 = 1/4 in (6.41) in Proposition 6.12 and 7 = 1/4 in (i) in Proposition 6.11 
it follows that | > ^{Xj = 1), for j = 1, . . . , — 2, when the weak balancing property is satisfied 
and 



(log (4) + 1) ^>Ci-q-^- {uk,i + . . . + ^fc,r), l<k<r 

(log (4) + 1)-^ > Ci • ((^7^ - 1) • /i^^ • 5i + . . . + (g-i - 1) • • ~Sr) 

Uk,i = min(/iN,M(^, • /^n,m(^, 0) 



(6.72) 
(6.73) 



as well as 



log2(4il^Mv^) 
log(16(M-s)) 
log2(4i^Mv^) 
log(16(M-s)) 



>C2-% •(i^/ci + .-. + ^/cr), l<A:<r 

> C2 • {{qi^ - 1) • /iiVo • Si + . . . + {q~^ - 1) • /iAr,_i • Sr) 



(6.74) 
(6.75) 



with K = maxi</e<r(A^/e — Nk-i)/mk. for some constants Ci and C2. Thus, to prove the claim 
we must demonstrate that (6.10) and (6.11) =^ (6.72), (6.73), (6.74) and (6.75). We split this into 
two cases: 

Case 1: ( [6lT| and ^1^ . 

To show the assertion we must demonstrate that if, for 1 < /c < r, 
ruk > (log(5e-^) + 1) • m/e • log {KM^s) , 

where rhk satisfies 



(6.76) 



1> 



rhi 



I] -fiNo-h 



Nr - Nr-1 



1 • I^Nr-i • 5r, 



we get (6.75) and (6.73). To see this, note that by (6.50) we have that 

Qk^Qk^i^- ^)Qk >qk, I <k <r, 



(6.77) 



(6.78) 



32 



so since q\= q\ = \qki and by (6.78), (6.76) and the choice of /i in (6.51), it fohows that 



2(4([log(7-i)+riog2(8ifMv^)l]) - 2)qk > qu 
rhk 



rrik 



> C- 



Nk - Nk-i 
log(se-^ + 1) log [KM^fs) , 



Nk - Nk-i 

for some constant C. And this gives (by recalling that 7 = e/6) that 

rhk log(se-i + 1) log (KMy^) 



qk>C- 



Nk - Nk-i riog(6e-^) + riog2(8i^Mv^)ll ' 
for some constant C. Hence, given that C is chosen appropriately, we get that qk > 



for 



1 < /c < r, which, since rhk satisfies (6.77), this imphes (6.75), given the appropriate choice of the 
constants. Note that similar reasoning also immediately implies (6.73). 
Case 2: dOO] ) ( [6^ and ^J^. 

To show the assertion we must demonstrate that if, for 1 < /c < r, 

1 > {\og{se-') + 1) • ~ ^'"^ • + . . . + Uk^r) ■ log (KM^) , (6.79) 

rrik 



we obtain (6.74) and (6.7_2 ). To see this, note that b y arg uing as above via the fact that q^ 
Qk ~ \Qki and by (6.78), (6.79) and the choice of ji in (6.51) we have that 



2(4([log(7-') + riog2(8i^Mv^)ll) - 2)qk > qu = 



rrik 



Nk - Nk-i 

> C{\og{se-^) + 1) • {uk,! + . . . + iyk,r) ■ log (i^Mv^) , 
for some constant C. We can now argue as in Case 1 and deduce that 

qk>C- {Vk^l + . . . + Vk^r) 



given that (7 is a appropriately chosen. However, that latter equation implies (6.74) and (6.72), 
given an appropriate choice of the constants. 

This yields the last puzzle of the proof, and we are done. 

□ 



Proof of Proposition \6.4\ The proof is very close to the proof of Proposition |6.3| and we will 
simply point out the differences. The strategy of the proof is to show the existence of a p G 
Tdin{U* {Pq^ ... Pftr-)) ^1^^^ satisfies (i), (i*)-(iv) in Proposition 6.1 with probability exceeding 
1 - e. 

Step I: The construction of p: The construction is almost identical to the construction in 
the proof of Proposition |6.3[ except that 

(6.80) 



ii = 4[log(7-i)+^l, ^= [log2(8i^Mv^)l, 
ai=a2 = (21og2^^(4i^Mv^))"\ Q^i = 1/2, 3 < i < u, 



as well as 



Pi=p2 = \, /?i = ilog2(4™Vi), 3<i<u, 



and (6.55) gets changed to 



_lU{i} if||(PA-PAC/*(^Po. 



jrPn^)UPA)Zi-i\\i^ <ai\\PA,Zi_i\ 



9; 



and||Pi[/*(^Pni 
otherwise, 



^Pn^)UPAZi-i\\i^ <Pi\\Zi_i\\i^, 



and the events 5^, i = 1,2 in (6.56) get replaced by 

Br. \\PiU*{\p^.^®...®\Pa^JUPAZi-i\\i^ </3i\\Zi.,\\i^, i = l,2. 

qi qr 



33 



Step II: ^ (i), (i*). This step is identical to Step II in the proof of Proposition 6.3 
Step III: =^ (ii), (iii). Equation (6.60) gets changed to 



< 



(01 



i=l 



Ql 



r(i-l)IU° 



1 



< 



i=l j=l 
log2(«) , 



"4^^+21og^/^a) ' 23log2(a) 



1 1 

2^-1^ - 2' 



Step IV: B^ (iv). This step is identical t o Step IV in the p roof of Proposition 6.3 



57. 



Step V: The strong balancing property, (6.13) and (6.14) ¥{AlUA^UBfUB^UB^) < 



We will start by bounding P(5J) and F{B2). Note that by Proposition 



6.11 



(ii) it follows that 

IP(^i) ^ 7 and ^(^2) ^ 7 as long as the strong balancing property is satisfied and 



1 > A • log 



1 > T • loe 



4 ~ 

-iO-s) 

1 



(6.81) 



where = ^({g'^j^^-L, 1/8, {V/c}^^-l, s, M) for z = 1,2 and where is defined in Proposition 
(ii) and A and T are defined in ( 6.64| ) and ( |6.65 ). Note that it is easy to see that we have 



6.11 



ieN: 



max 

ric{i,...,M}, |ri|=s 

r2,jC{Nj_i + l,...,Nj}, j = l, 



|Pr,t/*((gi)-'Pr.,, © ... © {ql)-'Pr,,^)Ue,\\ > 



8v^ 



< M, 



where 



M = mm{i e N : \\PnUP,^_^\\ < l/(i^32v^)}, 

and this follows from the choice in (6.52) where ql = ql = \qk for 1 < A: < r. Thus, it immediately 
follows that ( |6.13D and imply ( |6.8ip . 

As for bounding P(A^), ^(^2) and '^{B^)^ this is done exactly as in Step V of the proof of 
Proposition |6.3[ 



Step VI: The strong balancing property, ( |6.13D and (|6.14D ^ P(5|) < 7. 
To see this, define the random variables Xi, . . . Xu-2 as in (6.68). As in Step VI of the proof 
of Proposition 6.3 it suffices to show that ( |6.13 ) and (6.14) imply 

i>P(X, = l), j = l,...,^i-2, (6.82) 



Claim: The strong balancing property, (6.13) and (6.14) 

1 



(6.82). To prove the 



claim we first observe that Xj = when 



1 



||(Pa-Pa/7*(V^. 

qi 



1 



1, 



-P^.)/7Pa)Z,_i||,^<-||Z,_i||,o 
qr ^ 



1 



PkV\-P^^^...^-P^^)VP^Z,_^\\i^ < -log2(4i^Mv^)||Z,_i 



qi 



:j+2. 



qk=qk. 



1 < < r, and by choosing 



Thus, by again recalling from (6.52) that ql = q^ = • 
7 = 1/4 in (6.41) in Proposition 6.12 and 7 = 1/4 in (ii) in Propositio n |6.1l[ we conclude that 
(6.82) follows when the strong balancing property is satisfied as well as (6.72) and (6.73). and 



log2(4KMv^) 
log (l6{M - s)^ 

log2(4i^Mv^) 
log (l6{M - s)^ 



>C2 - %^ ■ {l^k,! + . . . + T^k,r), 1 <k <r 



> C2 • ((^1 ^ - 1) • /iATo • Si + . . . + (g^ ^ - 1) . IJ.Nr.-i 



(6.83) 
(6.84) 



34 



for K = maxi<fe<r(^fe — Nk-i) /mk- for some constants Ci and C2. Thus, to prove the claim 
we must demonstrate that ( |6.13| ) and ( |6.14[ ) ^ ( |6.72[ ), ( |6.73[ ), ( |6.83[ ) and ( |6.84[ ). This is done by 
repeating Case 1 and Case 2 in Step VI of the proof of Proposition |6.3| almost verbatim, except 
replacing M by M. □ 



6.4 Proof of Theorem K2\ 



We now prove Theorem |3.2[ First it is necessary to describe the wavelet construction in more 
detail. We refer to [7] for more details. Let [0, a] be a compact interval and suppose that we 
are given an orthonormal mother wavelet ^ and an orthonormal scaling function ^ such that 
supp(^) = supp($) = [0, a] for some a > 1. We also assume that for some a > 1 and C > 0, 



< 



c 



*(0 



< 



c 



(6.85) 



The most standard approach is to consider the following collection of functions 

= {^/c, ^i,/c : supp(<l>/e)^ fl [0, a] + 0, supp(\Er^- fc)^ n [0, a] ^ 0, j G Z+, A: G Z, }, 

where 

$fc = $(--fc), *,-fe = 2i«'(2^--fc). 

(the notation denotes the interior of a set C R). This now gives 

{/ G L2(R) : supp(/) C [0,a]} C span{^ -^^^a} Q {f e L2(K) : supp(/) C [Ti,T2]} , 

where Ti,T2 > are such that [— Ti,T2] contains the support of all functions in Qa- Note that 
the inclusions may be proper (but not always, as is the case with the Haar wavelet.) It is easy to 
see that 

a-\- k ^ k 
— < 0, a < — , 

> a -\- k < 0, a < k, 



-\a] <k< 2^\a]}. 



and therefore 

={^k : 1^1 = 0, . . . , [al - 1} U {^,, :jeZ^,ke 
We order fla in increasing order of wavelet resolution as follows: 

^0,-ral+l, . . . , ^0,-1, ^0,0, ^0,1, . . . , ^0,ral-l, ^1,-ral+l, • • •}• 

By the definition of we let Ti = \a] — 1 and T2 = 2[a] — 1. 

Having constructed an orthonormal wavelet system form [0,a] we now introduce the appro- 
priate Fourier sampling basis. We must sample at at least the Nyquist rate. Hence we let 
e < l/(Ti + T2) be the sampling density (note that l/(Ti + T2) is the Nyquist criterion for 
functions supported on [— Ti,T2]) and define 

lljj{x) = V^e2''^^'^^X[-Ti/(e(Ti+T2)),T2/(e(Ti+T2))](^), 

This gives an orthonormal sampling basis for Ti,T2]. 



Proof of Theorem\3^ Note that /i(/7) > |(<^>, V^o)!' 



which gives the first result. 



To show that ii{P^U) = O [N observe that the decay estimate in (6.85) yields 



Ijl{P^U) < max max \ {if,^l)k)\^ 



max < 



max max 

|/e|>f REN 2^ 



< max max 



-27riefc\ 



, max 
|fe|>f 



$ {-2mek) 



|fc|>f flsN 2« (1 + |27refc2-^|)^ 

C2 



< max 



Ren 2^ (1 + |7re7V2-^|) 



2cx • 



35 



The function f{x) = + 7re7V/x)"^" on [1, oo) is such that f'{7ieN{2a - 1)) = 0. Hence, 



7rA/'(2a- l)(l + l/(2a- 1)) 



2cx 



which gives /j.{P^U) = O {N'^). 

Let VtR^a contain ah wavelets in Vta with resolution less than so 

^R,a = eVta- ^ = "^j.k, j < k eZ or cp = k e Z}. 

Then, denoting the size of Qn^a by TV^^^ it is easy to verify that TV^^^ = 2^\a] + l)([a] - 1). 
Given any AT G N such that TV > N^^\ let R be such that 

7V(^) < TV < 7v(^+i). 

Then, for each n > there exists some j > R and / G Z such that the n^^ element via the 
ordering (6.86) is (pn = ^j.i- 



f^{UPN) = niaxmax|((/9n,V^/c)r 

n>N /cGZ 



max max — ^ 

j>R kez 2^ 

<ll*lli~^ 

<4||*||ie.^, 



-27riek\ 

) 



where the last line follows because N < Ar(^+i) = 2^+1 \a] + (i? + 2){\a] - 1) implies that 



2-^ < ^ {2\a] + {R^2){\a] - 1)2-^) < ^ 



So fi{UP^) = 0{N-^). 



□ 



7 Numerical examples 

In this section we present the numerical examples to support our theory. 
7.1 The GLPU Phantom 

To illustrate the main message that asymptotic sparsity and asymptotic incoherence yield the 
phenomenon that the success of compressed sensing is resolution dependent, it is preferable to 
work with a continuous model, which is closer to real life MRI scenarios. This is impossible if one 
employs an already discretized model at a fixed resolution level. For this reason we choose to use 
the novel GLPU phantom invented by Guerquin-Kern, Lejeune, Pruessmann and Unser in [30] in 
favour of the standard discretized Shepp-Logan Phantom from MATLAB. The GLPU phantom 
is a so-called analytic phantom, in that it is not a rasterized image, but rather a continuous (or 
infinite-resolution) object defined by analytic curves, such as Bezier curves. The MATLAB code 
offered by the authors allows one to compute the continuous (integral) Fourier transform (as in a 
real life MRI scenario) of the GLPU phantom, for any resolution, to avoid the inverse crime which 
results from using the discrete Fourier transform. 

We thus treat the GLPU phantom as a function f : D ^ M which is measured by taking 
equispaced pointwise samples of its continuous Fourier transform: 

/>)= / f{x)e-'^'^--dx, 

JD 

where D is typically [0, 1]^. 

We note that full resolution and uncropped versions of the images used in this section are 
available online [1^. 



36 



(a) n = 10, m = 0.05, a=L75, 6 = 2.25 



(b) n=100, m = 0.05, a = 1.75, 6 = 2.4726 



Figure 6: Examples of subsampling maps at 2048 x 2048 that subsample p = 15% of Fourier 
coefficients. Left: 10 levels, right: 100 levels. The color intensity denotes the fraction pk of 
random samples taken uniformly, i.e. white: 100% samples, black: 0% samples. 



7.2 Subsampling scheme 

Our subsampling scheme divides the 2D Fourier spectrum of N x N coefficients into n regions 
delimited by n — 1 equispaced concentric circles plus the full square, an example being shown 
in Figure [g] Normalizing the 2D Fourier spectrum to [—1,1]^, the circles have radius with 
/c = 0, . . . , n — 1, which are given by ro = m and rk = k ■ for /c > 0, where < m < 1 is a 
parameter. Inside each of the n regions, the fraction pi^ of Fourier coefficients taken with uniform 
probability is given by: 

p,=exp(-(6.^J), (7.1) 

where /c = 0, . . . , n and a > and b > are parameters, so the total fraction of coefficients that 
are subsampled from the full spectrum is then p = ^j^PkSk-, where Sk is the normalized area of 
the kih region. It is obvious that the first region will sample all Fourier coefficients (po = 1) and 
the remaining regions will sample a monotonically smaller fraction of coefficients (jpk+i < Pk)- 



The function (7.1) is very similar to the generalized Gaussian distribution function. Although 
we obtained good results with this subsampling scheme, we do not claim that it is optimal, nor 
the best all-rounder. As stated and shown previously, an optimal subsampling scheme is highly 
dependent on the signal structure (i.e. image content in this case) and also resolution dependent. 



Throughout this section we use n = 100 levels, but (7.1) is versatile enough and we obtained 
similar reconstruction errors with smaller values of n. 

The reasons we used such a subsampling scheme include a high degree of control for its shape, 
as well as a good match with the behaviour of the Fourier spectrum and the sparsity of real-life 
images, e.g. MRI images. Knowing that for a fixed M, the first MxM Fourier coefficients do not 
change when the image resolution N increases (i.e. N grows), and that the image is asymptotically 
sparse and asymptotically incoherent in the chosen wavelet basis as resolution increases, it is thus 
desirable that the subsampling scheme reshapes as a function of the resolution to achieve the 
lowest error when a constant fraction p of coefficients is to be subsampled. Using a normalized 
spectrum for (7.1) achieves that purpose to an extent]^ while the parameters a, 6 can be made as 



functions of N and of p so that the asymptotic sparsity is even better exploited as image resolution 
increases. 



7.3 Resolution dependence: Fixed fraction p of samples 

The important message here is that regardless of the subsampling scheme used^ the quality of the 
reconstruction will increase as the resolution increases when compared to the full sampled version 
at the respective resolution. As previously explained, this is because at high resolution levels the 
image signal is increasingly sparse and incoherent, which can be fruitfully exploited. 

^In this sense, ( |7.1| > will assign diffe rent probabilities for the same 2D Fourier coefficient oui,uj2 as the resolution 
changes, since the n regions in \7.1\ are defined on the normalized 2D Fourier spectrum. 



37 




(a) Full sampling. Rel- (b) Largest 5% of coef- (c) Multi level, 5% (d) 5% of coefficients, 

ative error is 0%. ficients. Linear recon- of coefficients, Gaus- power la\\[^ Relative 

struction. Relative er- sian law ( |7.1| >. Rela- error is 22.22%. 

ror is 17.62%. tive error is 18.59%. 



Figure 7: Subsampling 5% of Fourier coefficients. The bottom row shows the subsamphng map 
used. The relative error to the fuh-sampled case is large regardless of the subsampling scheme 
used. The relative error is computed as ||/ — /p||2/||/||2, where / and fp represent the reconstructed 
full sampled image and subsampled image respectively. 



At low resolution levels however, the sparsity level is also low and subsampling a small fraction 
p of Fourier coefficients is bound to give a poor quality reconstruction. In Figure [7| we show an 
experiment where we subsample 5% of Fourier coefficients from the first 256 x 256 using three 
different schemes and reconstruct using Daubechies 4 wavelets. As expected, the result is quite 
poor. We confirm that we tried a large number of different random subsampling schemes and 
wavelet bases and they all gave similarly poor results, visually and numerically. This may mislead 
one to believe that compressed sensing fails with high compression ration, i.e. low p. 

A much different conclusion is reached when the experiment is repeated at higher resolutions. 
A first example that illustrate the resolution dependence of compressed sensing reconstruction is 
shown in Figure 8j In this experiment we fixed the parameters m, a, 6 of the Gaussian subsam- 
pling scheme 7.1, and subsampled p = 5% of Fourier coefficients at increasing image resolutions 
from 256 x 256 to 4096 x 4096, reconstructing in the Daubechies 4 wavelet basis. The high and 
asymptotic sparsity of the wavelet coefficients at high resolutions allows a markedly better quality 
reconstruction than at low resolutions when compared to the full sampled version of the same 
resolution. 

We want to stress the fact that any subsampling scheme yields the same effect of resolution 
dependency, which was confirmed in our tests. 

A further example is shown in Figure |9] where we sampled 10% of Fourier coefficients, again 
keeping n = 100 levels but optimizing a and b to give the best reconstruction in both cases. In 
this case, the high resolution 4096x4096 reconstruction has hardly any visible artefacts left. A 
fraction of p = 10% means an effective factor of 10 speed-up for an MRI, and at this resolution it is 
equivalent in weight to a full sampled image of 1296x1296, but with the tremendous advantage that 
it is actually of better quality and with more details than the full sampled 1296x1296 equivalent. 

"^The power law subsampling scheme involves sampling the 2D Fourier coefficient 001,002 with probability (oof + 
0O2 + 1)"°", where —N/2 < 001,002 < N/2 — 1 and o; is a parameter here chosen as o; = 3/2. 



38 



256x256 
Error: 

19.86% 



512x512 
Error: 

10.69% 



1024x1024 
Error: 

7.35% 



2048x2048 



Error: 

4.87% 



4096x4096 
Error: 

3.06% 




Figure 8: Multi- level subsampling of 5% Fourier coefficients using (7.1) with fixed parameters 



n = 100, a = 1.75, b = 4.125. The left column (full sampled) and center column (subsampled) are 
crops of 256 x 256 pixels of the original full resolution versions, while the right column shows the 
uncropped subsampling map used. The error shown is the relative error between the subsampled 
and full sampled versions. 



39 




(a) 256 x 256 full sampled (left) and 10% subsampled (center). Relative error to full sampling is 
12.12%. Artefacts are obvious. 




(b) 4096x4096 full sampled (left) and 10% subsampled (center), showing crops of 256x256 to preserve 
pixel size. Relative error to full sampling is 1.48%. Artefacts are hardly visible. 

Figure 9: Improvement at 10% subsampling between resolutions. The subsampling map is shown 
in the right column. In (7.1)) we used n = 100 and m = 0.01 and varied a and b to obtain the 
best result in each case. 



7.4 Resolution dependence: Fixed number of samples 

The above result of resolution dependence with a fixed fraction p is due to the asymptotic sparsity 
and asymptotic incoherence, but is in part also due to the fact that a fixed fraction p does mean 
more coefficients being sampled as the resolution increases. 

A more spectacular result of asymptotic sparsity and asymptotic incoherence is obtained by 
running a similar experiment, but this time fixing the number of coefficients being sampled. 



rather than the fraction p. This was done in Figure 10, where the same number of coefficients 
was sampled: 512^ = 262144 Fourier coefficients. Fine details were hidden in the image and then 
three reconstructions were performed: (a) the full sampled version of 512x512 pixels, (b) the linear 
reconstruction of the subsampled 2048x2048 version by zero-padding the first 512x512 coefficients. 



and (c) the mulitlevel subsampled 2048x2048 reconstruction using (7.1) and the Daubechies 4 
wavelet basis. 

Even though the number of coefficients was the same in all reconstruction cases, the higher 
resolution and multilevel subsampling means that the asymptotic sparsity and incoherence of 
wavelet coefficients can be fruitfully exploited, and the fine details recovered to a much clearer 
extent, even in the presence of noise. 

This effectively means that by simply going higher in resolution or, equivalently, higher in the 
Fourier spectrum, one can recover a signal much closer to the exact one, yet taking the same 
amount of measurements. 



40 



Figure 10: Subsampling a fixed number of 512^ = 262144 Fourier coefficients, in three reconstruc- 
tion scenarios. Left column: no noise. Right column: white noise, SNR = 16 dB. (a) 512x512 full 
sampled reconstruction, (b) 2048 x 2048 linear reconstruction from the ffist 512 x 512 = 262144 
Fourier coefficients (zero padded), (c) 2048x2048 reconstructed from 262144 Fourier coefficients 
taken with a multi level scheme using ( |7.1[ ) and Daubechies 4, with m = 0.05, a = 1.25, b = 4.2539. 



8 Conclusions 

The purpose of this paper was to bridge an important gap between compressed sensing theory 
and its use in imaging applications by introducing a framework based on asymptotic incoherence, 
asymptotic sparsity and multilevel sampling. In doing so, we have not only given mathematical 
credence to the abundance on numerical evidence suggesting the usefulness of variable density 
sampling strategies, but also drawn important conclusions about the resolution dependence of 
compressed sensing and the signal structure dependence of optimal sampling strategies. 

Recently, Krahmer & Ward have shown recoverability in the discrete bivariate Haar wavelet 
basis from Fourier measurements taken according to an inverse square law [31]. Their analysis is 
valid for bivariate Haar wavelets with Fourier samples in the finite-dimensional setting for only 
one particular variable density. It is also based on the RIP, which, as discussed in Section |5.4[ 
is problematic in imaging and does not take into account asymptotic sparsity, which is a key in 
real-life situations. 

Our framework, on the other hand, allows for arbitrary sampling and sparsity systems in both 
the finite- and infinite-dimensional setting, as well as rather general types of sampling strategies, 
and avoids the RIP. We also make clear the interplay between asymptotic sparsity and asymptotic 
incoherence, and how they lead to the key conclusion of resolution dependence. Having said this, 
recoverability results for TV-minimization — an important and powerful technique in imaging — are 
also given in [31 . Whilst beyond the scope of this paper, we expect that our results can also be 
extended to the TV case. This is an objective of ongoing work. 

As explained in the paper, traditional sparsity is not sufficient to describe the behaviour of 



41 



natural images. Asymptotic sparsity in levels more accurately describes natural images, and hence 
the behaviour of multilevel sampling strategies. We remark that we are by no means the first to 
advocate a more structured notion of sparsity. So-called model-based compressed sensing see [81124] 
and references therein) also move beyond sparsity. However, this is quite different to our approach. 
Algorithms such as those found in [8 seek to incorporate substantially more intricate models than 
standard sparsity (or, indeed, asymptotic sparsity in levels) so as to further reduce the number 
of measurements required in recovery problems that are already incoherent beyond that needed 
for standard compressed sensing algorithms. The main problem in [8 is that of sampling with 
random Gaussians. Although highly incoherent, this is not a suitable model MRI, where the 
measurement system (i.e. Fourier samples) is constrained by the physical device. On the other 
hand, the motivation in our work stems primarily from the desire to tackle the fixed, and high, 
coherence. The fact that the resulting algorithm of multilevel sampling is well suited for precisely 
the types of signals and images one sees in application is, in many senses, a serendipity. 

We have concluded in our work that the optimal sampling strategy is dependent on the signal 
structure. Further work is required to determine such strategies in a rigorous empirical manner for 
important classes of images. We expect our main theorems to give important insights into these 
investigations, such as how many levels to choose, how to choose their relative sizes, etc. One 
conclusion of our work, however, is that approaches to design optimal sampling densities based 
solely on minimizing coherences (i.e. not taking into account asymptotic sparsity) may be of little 
use in practice unless they are trained on large families of images having similar structures (e.g. 
brain images). 

Another current investigation involves deriving refined versions of our theorems (which are 
completely general) in the case of Fourier sampling with wavelets. In particular, we expect it to 
be possible to demonstrate that the 'interference' between sparsity levels (see Section [42] ) cannot 
be too bad in this setting, meaning that the key bounds on the number of measurements can be 
simplified to something similar to the block diagonal case (see Section 4.2.2). This work will build 
on the techniques developed in [7 . 

Finally, we would like to point out that asymptotic sparsity is not only relevant for wavelets. 
Any approximation system whose power lies in nonlinear, as opposed to linear, approximation 
will give rise to asymptotically sparse representations. Such systems include curvelets [9l fTT] . 
contourlets [20, 37 and shear lets [ITlIIHlEJ , to name but a few. Hence we expect our framework 
to have relevance in numerous other applications. 



Acknowledgements 

The authors would like to thank Akram Aldroubi, Emmanuel Candes, Massimo Fornasier, Felix 
Krahmer, Thomas Strohmer, Gerd Teschke and Rachel Ward for useful discussions and comments. 



References 

[1] |http : //subsample . org , Feb. 2013. 

[2] B. Adcock and A. C. Hansen. Generalized sampling and infinite- dimensional compressed sensing. 
Technical report NA20 11/02, DAMTP, University of Cambridge, 2011. 

[3] B. Adcock and A. C. Hansen. A generalized sampling theorem for stable reconstructions in arbitrary 
bases. J. Fourier Anal AppL, 18(4):685-716, 2012. 

[4] B. Adcock and A. C. Hansen. Stable reconstructions in Hilbert spaces and the resolution of the Gibbs 
phenomenon. Appl. Comput. Harmon. Anal., 32(3):357-388, 2012. 

[5] B. Adcock, A. C. Hansen, E. Herrholz, and G. Teschke. Generalized sampling: extension to frames 
and inverse and ill-posed problems. Inverse Problems, (to appear). 

[6] B. Adcock, A. C. Hansen, and C. Poon. Beyond consistent reconstructions: optimality and sharp 
bounds for generalized sampling, and application to the uniform resampling problem. Preprint, 2012. 

[7] B. Adcock, A. C. Hansen, and C. Poon. On optimal wavelet reconstructions from Fourier sam- 
ples: linearity and universality of the stable sampling rate. Technical report NA2012/07, DAMTP, 
University of Cambridge, 2012. 



42 



R. G. Baraniuk, V. Cebher, M. F. Duarte, and C. Hedge. Model-based compressive sensing. IEEE 
Trans. Inform. Theory, 56(4):1982-2001, 2010. 

E. Candes and D. L. Donoho. Recovering edges in ill-posed inverse problems: optimality of curvelet 
frames. Ann. Statist, 30(3): 784-842, 2002. 

E. J. Candes. An introduction to compressive sensing. IEEE Signal Process. Mag., 25(2):21-30, 2008. 

E. J. Candes and D. Donoho. New tight frames of curvelets and optimal representations of objects 
with piecewise singularities. Comm. Pure Appl. Math., 57(2):219-266, 2004. 

E. J. Candes and Y. Plan. A probabilistic and RIPless theory of compressed sensing. IEEE Trans. 
Inform. Theory, 57(ll):7235-7254, 2011. 

E. J. Candes and J. Romberg. Sparsity and incoherence in compressive sampling. Inverse Problems, 
23(3):969-985, 2007. 

E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction 
from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489-509, 2006. 

Y. Chi, L. L. Scharf, A. Fezeshki, and A. Calderbank. Sensitivity to basis mismatch in compressed 
sensing. IEEE Trans. Signal Process., 59(5):2182-2195, 2011. 

M. S. Crouse, R. D. Nowak, and R. G. Baraniuk. Wavelet-based statistical signal processing using 
hidden Markov models. IEEE Trans. Signal Process., 46:886-902, 1998. 

S. Dahlke, G. Kutyniok, P. Maass, C. Sagiv, H.-G. Stark, and G. Teschke. The uncertainty princi- 
ple associated with the continuous shearlet transform. Int. J. Wavelets Multiresolut. Inf. Process., 
6(2):157-181, 2008. 

S. Dahlke, G. Kutyniok, G. Steidl, and G. Teschke. Shearlet coorbit spaces and associated Banach 
frames. Appl. Comput. Harmon. Anal, 27(2): 195-214, 2009. 

M. A. Davenport, M. F. Duarte, Y. C. Eldar, and G. Kutyniok. Introduction to compressed sensing. 
In Compressed Sensing: Theory and Applications. Cambridge University Press, 2011. 

M. N. Do and M. Vetterli. The contourlet transform: An efficient directional multiresolution image 
representation. IEEE Trans. Image Proc, 14(12):2091-2106, 2005. 

D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4): 1289-1306, 2006. 

D. L. Donoho and M. Elad. Optimally sparse representation in general (non-orthogonal) dictionaries 
vi h minimizatiob. Proc. Natl Acad. Sci. USA, 100:2197-2002, 2003. 

D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans. 
Inform. Theory, 47:2845-2862, 2001. 

M. F. Duarte and Y. C. Eldar. Structured compressed sensing: from theory to applications. IEEE 
Trans. Signal Process., 59(9):4053-4085, 2011. 

Y. C. Eldar and G. Kutyniok, editors. Compressed Sensing: Theory and Applications. Cambridge 
University Press, 2012. 

M. Fornasier and H. Rauhut. Compressive sensing. In Handbook of Mathematical Methods in Imaging, 
pages 187-228. Springer, 2011. 

S. Foucart and H. Rauhut. A Mathematical Introduction to Compressive Sensing. In preparation, 
2013. 

D. Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theor., 
57(3):1548-1566, Mar. 2011. 

M. Guerquin-Kern, M. Haberlin, K. Pruessmann, and M. Unser. A fast wavelet-based reconstruction 
method for magnetic resonance imaging. IEEE Transactions on Medical Imaging, 30(9): 1649-1660, 
2011. 

M. Guerquin-Kern, L. Lejeune, K. P. Pruessmann, and M. Unser. Realistic analytical phantoms for 
parallel Magnetic Resonance Imaging. IEEE Trans. Med. Imaging, 31(3):626-636, 2012. 

F. Krahmer and R. Ward. Compressive imaging: stable and robust recovery from variable density 
frequency samples. Preprint, 2012. 

G. Kutyniok, J. Lemvig, and W.-Q. Lim. Compactly supported shearlets. In M. Neamtu and L. Schu- 
maker, editors, Approximation Theory XIII: San Antonio 2010, volume 13 of Springer Proceedings 
in Mathematics, pages 163-186. Springer New York, 2012. 

M. Lustig. Sparse MRI. PhD thesis, Stanford University, 2008. 



43 



[34] M. Lustig, D. L. Donoho, and J. M. Pauly. Sparse MRI: the application of compressed sensing for 
rapid MRI imaging. Magn. Reson. Imaging, 58(6):1182-1195, 2007. 

[35] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressed Sensing MRI. IEEE Signal 
Process. Mag., 25(2):72-82, March 2008. 

[36] C. McDiarmid. Concentration. In Probabilistic methods for algorithmic discrete mathematics, vol- 
ume 16 of Algorithms Combin., pages 195-248. Springer, Berlin, 1998. 

[37] D. D.-Y. Po and M. N. Do. Directional multiscale modeling of images using the contourlet transform. 
IEEE Trans. Image Proc, 15(6):1610-1620, June 2006. 

[38] G. Puy, J. P. Marques, R. Gruetter, J. Thiran, D. Van De Ville, P. Vandergheynst, and Y. Wiaux. 
Spread spectrum Magnetic Resonance Imaging. IEEE Trans. Med. Imaging, 31(3):586-598, 2012. 

[39] G. Puy, P. Vandergheynst, and Y. Wiaux. On variable density compressive sampling. IEEE Signal 
Process. Letters, 18:595-598, 2011. 

[40] M. Rudelson. Random vectors in the isotropic position. J. Fund. Anal., 164(1) :60-72, 1999. 

[41] T. Strohmer. Measure what should be measured: progress and challenges in compressive sensing. 
IEEE Signal Process. Letters, 19(12) :887-893, 2012. 

[42] V. Studer, J. Bobin, M. Chahid, H. Moussavi, E. Candes, and M. Dahan. Compressive fluorescence 
microscopy for biological and hyperspectral imaging. Submitted, 2011. 

[43] M. Talagrand. New concentration inequalities in product spaces. Invent. Math., 126(3):505-563, 
1996. 

[44] J. A. Tropp. On the conditioning of random subdictionaries. Appl. Comput. Harmon. Anal, 25(1) :1- 
24, 2008. 

[45] Z. Wang and G. R. Arce. Variable density compressed image sampling. IEEE Trans. Image Proc, 
19(l):264-270, 2010. 



44 



